pachterlab / seqspec

machine-readable file format for genomic library sequence and structure
MIT License
109 stars 17 forks source link

added specs for ISSAAC-seq, PIP-seq V2 and PIP-seq V3 #12

Closed dbrg77 closed 1 year ago

dbrg77 commented 1 year ago

First attempt to create a spec for ISSAAC-seq, PIPseq V2 and V3

dbrg77 commented 1 year ago

Thank you! I think it is a good idea to add example FASTQs.

Then, I also suggest adding a readme.txt or the like to very briefly describe where the FASTQs come from (in-house experiments, or public repository etc.) and the organism of the FASTQ.

I see the 'assays' folder has been renamed to 'specs', so I guess we would have:

specs/MYASSAY
├── onlist.txt.gz
├── ...
├── spec.yaml
└── fastqs
    ├── readme.txt
    ├── R1.fastq.gz
    ├── R2.fastq.gz
    └── ...

What do you think?

sbooeshaghi commented 1 year ago

Yes- the folder name has changed just to be more specific. Its a good idea to also have a readme or json file that explains the origin of the FASTQ files- for now we can just have a tsv with a row for each fastq file, the first column being the file name and the second column being the origin of the FASTQ file (GEO accession or FTP link). How does that sound?

dbrg77 commented 1 year ago

Sounds good. I have added the fastqs with 1 million reads.

sbooeshaghi commented 1 year ago

Great work! Thanks Xi- I've merged the changes.

detrout commented 1 year ago

So I just want to mention a warning about adding fastq reads.

We can certainly provide it for mouse data, but human subjects are frequently under contract to keep raw genomic information private. So any files submitted should either be from a model organism or someone probably needs to check that any human data is completely open. Which may require getting lawyers and the IRB to sign off on it.

detrout commented 1 year ago

Also there's the separate thing of people may not want to release their data publicly until after their paper is out.

sbooeshaghi commented 1 year ago

Fair warning but I don't think this is a concern as sequencing reads will be made available here if they are available on public repositories (e.g. GEO/SRA etc) for which approvals have already been made.

lakigigar commented 1 year ago

Moreover, the reads included for a seqspec are a demo and there is no reason they have to be human or unpublished.