Closed wongdanr closed 2 years ago
Hi @wongdanr, DeepProfiler features are currently available only for ten plates.
Thanks @niranjchandrasekaran appreciate it. Are there any plans to generate the rest of the profiles for the other plates? I'm trying to generate them myself, but I'm new to DeepProfiler and I'm learning that it is not that easy to do.
Hi @wongdanr, at the moment we are not planning to generate the DeepProfiler features for the rest of the plates. Please continue to ask your questions either in the repo for the handbook or in the slack channel and I am sure the DeepProfiler users will be able to help you.
Thanks @niranjchandrasekaran. In preparation for applying DeepProfiler to this dataset, I need single cell locations and their corresponding well and site. From the sqlite files provided in the Step 2's README's S3 bucket, I don't see any mapping from the single cell features to the corresponding well and site (only mappings from single cell features to things like index, TableNumber, ImageNumber, etc.). Is this mapping to well/site available? Thanks!
I am tagging in @johnarevalo, who will likely know how to extract the single cell locations from the SQLite file.
@wongdanr There should be a bunch of columns for Metadata, including Metadata_Well
and Metadata_Site
; there should also be a couple of different columns (not entirely but sufficiently so for these purposes) for Location_Center_X
and Location_Center_Y
Thanks @bethac07! Exactly what I was looking for. Does anyone know why there are 16 "sites" when there are only 9 image fields? How do you map sites to field? Aren't these synonymous?
Most plates had 9 sites, but some more and some fewer - can you clarify which plate? But yes, typically "site" and "field" are use interchangeably
I see thank you very much!
Hi @niranjchandrasekaran where is the model you used to generate the DeepProfiler embeddings? Thanks!
Hi @wongdanr, John used EfficientNet with pretrained features. More details here and in the section, Deep learning feature extraction, of the manuscript.
Thank you @niranjchandrasekaran and @johnarevalo. The README says to download the pretrained model, but I don't see it in the deep_profiles/ directory. Can this be shared? Appreciate it!
Hi @wongdanr,
In the last version of DeepProfiler, by setting profile: checkpoint
to "None"
(as string) in the jump.json
file, DeepProfiler automatically downloads the pretrained weights.
I'll open an issue to update the README.
Thank you @johnarevalo. Is it possible to provide the updated jump.json file used to generate the deep profiles? I tried using the one in deep_profiles/inputs/config/ but I think this might be out of date? Thank you!
We used the file you mentioned in this github repo. Could you paste the output of the profile
command after setting checkpoint: "None"
in jump.json ?
Thanks @johnarevalo, sorry it's working now actually. It looks like a new key needs to be added to the json file called 'label_smoothing' ('train':'model':'params':'label_smoothing').
Thanks for debugging this @wongdanr. DeepProfiler is still under development and backward incompatibilities can be added without notice.
Thank you very much @johnarevalo I was able to profile the CPJUMP1 compound data. From the README, I wasn't sure how the various profiles in outputs/results/features/ were aggregated. Did you simply take the median of the various profiles within a well to get a well-level aggregation vector, and then apply Pycytominer to the well-level median vectors to get the final deep profiles reported in the repo. Thanks!
We used the build_profiles.py
script to aggregate the extracted features using mean:
https://github.com/jump-cellpainting/2021_Chandrasekaran_submitted/blob/58583b45e01e06da7a642dd92b7f955e2fe37226/deep_profiles/utils/build_profiles.py#L25
Thank you @johnarevalo. Once the .parquet file is created, how do I create the normalized profiles of all the plates using pycytominer? I cloned the repo neurips_cpjump1. It seems like the neurips_cpjump1/run.sh script processes only 10 of the plates. I'm not sure where I can specify the created .parquet file as an argument. Thanks!
It's great you have generated the features in the .parquet
file!. This file contains 2 metadata columns (Metadata_Plate
, Metadata_Well
) and 6400 feature columns as described in the readme. The PLATE_ID.csv.gz
files are just subsets of the parquet file split by Metadata_Plate
.
You can obtain such splits with pandas:
df = pd.read_parquet('profiles.parquet')
groups = df.groupby('Metadata_Plate')
for plate_id, group in groups:
group.to_csv(f'{plate_id}.csv.gz', compression='gzip', index=False)
I haven't tested it, but I guess you get the idea.
Great thank you! I'm wondering more about how to generate the normalized versions of those though, such as the "augmented" and "spherized" versions that are included in the repo. @johnarevalo
You can follow the profiling recipe repo to generate the augmented profiles and to run the rest of the pipeline.
Thanks @johnarevalo, where can I find details about the pre-trained model that gets automatically downloaded? Was this model trained to classify drug perturbation type or plate? I see in the jump.json file that "label_field": "Treatment" but I also see that "targets": ["Metadata_Plate"] and I'm not quite sure which one is used for the classification label. Also was this model trained on just the JUMP1 compound data?
The train
section in jump.json
is considered only when train
operation runs. In this case we ran the profile
operation to extract features using a model pretrained with ImageNet. So any value set in train
, including the ones you mention, are ignored by DeepProfiler.
DeepProfiler relies on efficientnet library. You can check the list of available models and the details of each. DeepProfiler uses EfficientNetB0 As default.
Hello @niranjchandrasekaran, Are the (well-level) profiles from DeepProfiler available for download? Wondering where I can find them. Thank you!