Open lewismervin1 opened 1 year ago
Hi Lewis, I am tagging @shntnu who should be able to tell you what our current plans are.
Thanks @niranjchandrasekaran and @shntnu. The reason we ask, is because we can only access one of the (full plate) parquet files at the moment, and are missing the _feature_select_negcon_plate.csv.gz, _normalized_feature_select_plate.csv.gz etc. files.
Hi Lewis, thanks for the additional context. Generating those additional files will require data alignment and normalization across all the sources, which we are still working on. Once we settle on the approach that we would take, we will either have per-plate parquet versions of those files or a single parquet file with all the plates (to be decided).
Hi @niranjchandrasekaran, we were wondering if there is a decision for how these files should look and if this issue should be closed? Many thanks for your help.
@lewismervin1 thanks for checking in. We're still working on a data processing pipeline for getting all the JUMP data aligned.
we will either have per-plate parquet versions of those files or a single parquet file with all the plates (to be decided).
We will eventually provide per-plate parquet but the first few versions of the aligned data will be either a single PyArrow Dataset.
Once we've completed implementing our new data validation system + schema (in progress here https://github.com/broadinstitute/cpg), we will distribute them as per-plate parquets (very likely using the same folder structure)
We will eventually provide per-plate parquet but the first few versions of the aligned data will be either a single PyArrow Dataset.
@lewismervin1 This is now available (the PR is still open, but you can peek in already)
We noticed that the expected
workspace
folder structure for profiles (https://github.com/broadinstitute/cellpainting-gallery/blob/main/folder_structure.md#profiles-folder-structure), i.e.:are actually directories of single parquet files (similar to the ones expected in
workspace_dl
https://github.com/broadinstitute/cellpainting-gallery/blob/main/folder_structure.md#profiles-folder-structure-1). Is this expected or does folder_structure.md need updating?Many thanks for any help!