metamaden / recountmethylation

Utilities to access and analyze harmonized databases of public DNAm arrays from GEO.
https://recount.bio/data/
9 stars 2 forks source link

Documentation for 0-0-1, 0-0-2, 0-0-3 labels in in https://recount.bio/ #22

Closed bernardo-heberle closed 3 months ago

bernardo-heberle commented 7 months ago

Hello,

Thank you for compiling all these methylation data, this is a major resource!

I am working on a project where I will use some of the data you compiled and stored under: https://recount.bio/

However, I am a bit confused about the data labels. Particularly the 0-0-1 vs 0-0-2 vs 0-0-3 labels for the directories in https://recount.bio/

For example I noticed that the assays.h5 file under https://recount.bio/remethdb_h5se-gm_hm450k_0-0-2_1607018051/ is significantly larger than the assays.h5 file under https://recount.bio/remethdb_hm450k_h5se_gm_1669220733_0-0-3/

Can you provide an explanation for what those labels mean?

Thank you for your help, Bernardo

bernardo-heberle commented 7 months ago

Just a kind reminder about this issue.

metamaden commented 3 months ago

Hi there,

Thank you for your interest in this project.

For details about the contents of the compilation files, including file sizes and number of samples by array platform, please consult the recountmethylation User's Guide sections 1.1 and 1.2 (recountmethylation_users_guide).

The short explanation is we compiled available samples for each platform within a time interval, and at the time interval between completing compilations v0.0.2 and v0.0.3, more new EPIC/HM850K array samples were published to GEO than HM450K samples, and total HM450K samples were fewer in v0.0.3 than either v0.0.2 or v0.0.1.

The filenames include some details about their contents, such as platform ("epic" or "hm450k"), version ("0-0-1", "0-0-2", or "0-0-3"), data format ("h5" or "h5se"), and processing level ("rg" for RGChannelSet, "gr" for GenomicRanges, or "gm" for GenomicMethylSet).

Please let me know if you have any further questions.

Best regards,

Sean