include-dcc / DMC_v3_tasks

Issues for DMC v3 project board
0 stars 0 forks source link

Determine data types to be processed #28

Open ByroneCole-SageBionetworks opened 1 year ago

ByroneCole-SageBionetworks commented 1 year ago

Edit: Since we aren't so sure about the metadata available, this might be more helpful:

  1. enumerate anticipated data types;
  2. determine which data types we already have existing workflows to support;
  3. determine which data types we already have models/schemas to collect metadata;
  4. map out a plan to fill in any gaps for both data models and workflows (based on downstream use cases)

Internal JIRA tickets:

thomasyu888 commented 1 year ago

@ByroneCole-SageBionetworks , @kjflynn . Thanks for setting this up, do you know who I should reach out to to find out the new genomic data types that are being supported in INCLUDE?

kjflynn commented 1 year ago

Hi @thomasyu888 do you mean for V3 or generally? probably for both actually start with @lopierra

thomasyu888 commented 1 year ago

@kjflynn For V3 and just generally. Thanks!

lopierra commented 1 year ago

Did you mean new data types, or just new data? I don't think we have new genomic data types, just WGS and RNAseq as last time. We will have WGS from de Smith, Hakonarson, and HTP, and RNAseq from Hakonarson and HTP.

thomasyu888 commented 1 year ago

Thanks @lopierra . I meant new data types.

thomasyu888 commented 1 year ago

Had a discussion internally, and this is the summary:

thomasyu888 commented 1 year ago

I have some questions.

kjflynn commented 1 year ago

Hey Tom,

For your second question, these are three different immunological data types.

Flow cytometry is sorted and counted fluorescently labeled cells. CyTOF is a higher dimension flow cytometry (often called mass cytometry) which uses heavy metal labeling to sort and count cells. Cytokine profiling is a measure of secreted immune-signaling proteins.

I’ll let Pierrette answer the other two bullets.

On Sun, Feb 5, 2023 at 4:59 PM Thomas Yu @.***> wrote:

I have some questions.

  • Is there a difference between R01 metablomics and metablomics? If so, what is it?
  • What is the difference between Flow Cytometry, CyTOF and Cytokine profiles?
  • Did we have a list of the "other sequencing?"

— Reply to this email directly, view it on GitHub https://github.com/include-dcc/DMC_v3_tasks/issues/28#issuecomment-1418338284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASWIZLUR5LEWLXN72UAVD3WWBEIFANCNFSM6AAAAAATYMR6QE . You are receiving this because you were mentioned.Message ID: @.***>

thomasyu888 commented 1 year ago

Thanks! I took what was in this spreadsheet and I did a unique counts on the data_type column (minus the cognitive and clinical data_type_short) and got this count.

Data Type # cohorts
Other sequencing (targeted, GWAS, DNA methylation, etc.) 6
Neuroimaging 4
Metabolomics/RO1 metablomics 4
Cytokine profiles 3
Proteomics 3
CyTOF 2
EEG 1
Head/neck MRI 1
Flow Cytometry 1
Pulse wave velocity 1
Sleep - summary, saturation, PSG, etc 1
Home & lab sleep apnea test (Nox A1) 1
sleep - Actigraphy, PSG 1

I'm thinking we could try to find nexflow or CWL workflows for those data types that don't have defined workflows as Cavatica applications to execute on the data. Some questions:

lopierra commented 1 year ago

The Assays tab in that same spreadsheet is a little more granular on different types of sequencing, etc.

"R01 metabolomics" just means metabolomics data from a previous R01 grant.

I'm not sure we have enough metadata currently to start setting up workflows. Like there are numerous types of proteomics, and I'm not sure what each cohort has, and probably wouldn't ask for the details until they're actually ready to send data.

I'm also not sure if the plan is to import and harmonize all the actual data, or just make the files available. I guess it depends on the data type and whether multiple cohorts are doing comparable assays that could be analyzed together.

thomasyu888 commented 1 year ago

Thanks @lopierra - this is very helpful!

This ticket is specifically to determine data types we would want processed along with whether or not there are existing bioinformatics workflows. I'll take a look at the assays tab and regenerate some numbers.

lopierra commented 1 year ago

thanks for doing that! I just added a couple more assays for the Aldinger cohort (we just talked last week and I haven't had a chance to update her info in the other tabs yet). She will have single-cell RNAseq and genotyping of fetal tissue.

thomasyu888 commented 1 year ago

Sorry for the long delays, but I took a look from the assays sheet, and it would be helpful if we had a dictionary of assays that cohorts could choose from. That said, here are all the assays that had greater than 1 cohort (Aside from RNASeq and WGS - which have workflows)

Assay Number of Cohorts
Neuroimaging - volumetric MRI, fMRI, fNIRS, DTI, DSI) 6
Metabolomics/NMR Metabolomics/P4C mass spec metabolomics / R01 metabolomics 5
cytokine / MSD cytokine 3
SOMAscan proteomics / proteomics 3
CyTOF 2
amyloid-PET 2
tau-PET 2

Are these still true:

I'm not sure we have enough metadata currently to start setting up workflows. Like there are numerous types of proteomics, and I'm not sure what each cohort has, and probably wouldn't ask for the details until they're actually ready to send data.

I'm also not sure if the plan is to import and harmonize all the actual data, or just make the files available. I guess it depends on the data type and whether multiple cohorts are doing comparable assays that could be analyzed together.

lopierra commented 1 year ago

We have not gotten any more assay data since the Oct 2022 release. However, Korenberg is getting ready to send us data - they have RNAseq, methylation, MRI imaging, cognitive tests, and lab data. I still don't know about harmonization vs. making files available - we should bring this up at Data Implementers at some point. An additional complication is that ABC-DS will not allow their assay data to be displayed in the portal, so I'm not sure we should even count those in the number of cohorts.

thomasyu888 commented 1 year ago

Ah I see.... Thanks for the update - will discuss this in the data implementors meeting soon!