iRNA-COSI / APAeval

Community effort to evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples
MIT License
13 stars 14 forks source link

OpenEBench summary workflow: Review Q2 reference datasets #145

Closed AsierGonzalez closed 2 years ago

AsierGonzalez commented 3 years ago

As per the discussion on Slack and the meeting today:

There are currently nearly 60 reference datasets for Q2:

HEK293_none_R1.SRR3543897.Aseq.hg38.bed       Mayr_CD5B_R5.SRR6795686.3seq.hg38.bed         Mayr_GC_group.MultiSRR.3seq.hg38.bed        Mayr_NB_R3.SRR6795688.3seq.hg38.bed               P19_siControl_R1.SRR11918617.MACEseq.mm10.bed
HEK293_none_R2.SRR3543945.Aseq.hg38.bed       Mayr_CD5B_R5.SRR6795686.3seqTEonly.hg38.bed   Mayr_GC_group.MultiSRR.3seqTEonly.hg38.bed  Mayr_NB_R3.SRR6795688.3seqTEonly.hg38.bed         P19_siControl_R2.SRR11918618.MACEseq.mm10.bed
HEK293_siControl_R1.SRR2922409.Aseq.hg38.bed  Mayr_CD5B_R6.SRR6795687.3seq.hg38.bed         Mayr_M_R1.SRR6795690.3seq.hg38.bed          Mayr_NB_R4.SRR6795689.3seq.hg38.bed               P19_siSrsf3_R1.SRR11918619.MACEseq.mm10.bed
HEK293_siControl_R2.SRR2922448.Aseq.hg38.bed  Mayr_CD5B_R6.SRR6795687.3seqTEonly.hg38.bed   Mayr_M_R1.SRR6795690.3seqTEonly.hg38.bed    Mayr_NB_R4.SRR6795689.3seqTEonly.hg38.bed         P19_siSrsf3_R2.SRR11918620.MACEseq.mm10.bed
HEK293_siHNRNPC_R1.SRR2922419.Aseq.hg38.bed   Mayr_CD5B_group.MultiSRR.3seq.hg38.bed        Mayr_M_R2.SRR6795691.3seq.hg38.bed          Mayr_NB_group.MultiSRR.3seq.hg38.bed              P19_siSrsf7_R1.SRR11918621.MACEseq.mm10.bed
HEK293_siHNRNPC_R2.SRR2922449.Aseq.hg38.bed   Mayr_CD5B_group.MultiSRR.3seqTEonly.hg38.bed  Mayr_M_R2.SRR6795691.3seqTEonly.hg38.bed    Mayr_NB_group.MultiSRR.3seqTEonly.hg38.bed        P19_siSrsf7_R2.SRR11918622.MACEseq.mm10.bed
Mayr_CD5B_R3.SRR6795684.3seq.hg38.bed         Mayr_GC_R1.SRR6795692.3seq.hg38.bed           Mayr_NB_R1.SRR1005606.3seq.hg38.bed         Mayr_PC_R1.SRR6795694.3seq.hg38.bed               mouseCortex_adult_R1and2.GSM1614167.PAPERCLIP.mm10.bed
Mayr_CD5B_R3.SRR6795684.3seqTEonly.hg38.bed   Mayr_GC_R1.SRR6795692.3seqTEonly.hg38.bed     Mayr_NB_R1.SRR1005606.3seqTEonly.hg38.bed   Mayr_PC_R1.SRR6795694.3seqTEonly.hg38.bed         mouseCortex_embryonic_R1and2.GSM1614169.PAPERCLIP.mm10.bed
Mayr_CD5B_R4.SRR6795685.3seq.hg38.bed         Mayr_GC_R2.SRR6795693.3seq.hg38.bed           Mayr_NB_R2.SRR1005607.3seq.hg38.bed         Mayr_allBcell_group.MultiSRR.3seq.hg38.bed        
Mayr_CD5B_R4.SRR6795685.3seqTEonly.hg38.bed   Mayr_GC_R2.SRR6795693.3seqTEonly.hg38.bed     Mayr_NB_R2.SRR1005607.3seqTEonly.hg38.bed   Mayr_allBcell_group.MultiSRR.3seqTEonly.hg38.bed

In OpenEBench there is one reference file per benchmarking challenge. According to @uniqueg, you do plan to have numerous benchmarks but it is possible that some of the reference files are replicates so it would be good if members of the data and/or metrics teams could double-check them.

AsierGonzalez commented 3 years ago

Regardless of whether any reference files are dropped or not, they should at least be rename. Currently the, name of the reference file must match the pattern <challenge_id>.bed (line 57 in compute_metrics.py)

ninsch3000 commented 2 years ago

Obsolete, as OEB nomenclature has been adapted in the meantime and we are aware that there is one reference file per challenge. Identifier Q2 does NOT correspond to a challenge, but is an old tag for a particular metric.