topdownproteomics / sdk

Software solution for common top-down proteomics tasks
http://www.topdownproteomics.org/
MIT License
9 stars 4 forks source link

Proteoform lists and line lengths #30

Open trishorts opened 6 years ago

trishorts commented 6 years ago

wondering about reading/writing multiple proteoforms. Does a proteoform have to be completely on a single line? if not, is there a line length (like fasta). Does each proteoform have to start with a specific character to know when the thing begins, like ">"?

rfellers commented 6 years ago

Excellent question, I've wondered the same thing. I don't believe that the standard addresses this, right? I've always imagined a file with a single ProForma Term on each line, but the idea of a FASTA-like setup is interesting. Would provide a place for metadata ... but honestly, I'm not sure that there would be that much to say about specific proteoforms.

acesnik commented 6 years ago

This is being discussed here, too: https://github.com/topdownproteomics/ProteoformNomenclatureStandard/issues/11

Personally, I'm in favor of FASTA-like setup to allow specifying metadata (accessions, ontologies).

acesnik commented 6 years ago

Should we allow the files to be split at 60 characters per line? That could be pretty strange if tags get split in the middle.

Examples: PROTEOFORMPROTEOFORMPROTEOFORMPROTEOFORMPROTEOFORMPROT[Phosp ho]EOFORM ROTEOFORMPROTEOFORMPROTEOFORMPROTEOFORMPROTEOFORMPROT[mass:7 9.98]EOFORM

acesnik commented 5 years ago

FASTQ files are never split at 60 characters, so there is precedent for requiring no line breaks. But if we're going to call it a ProForma FASTA file, we should allow line breaks.

rfellers commented 5 years ago

my 2 cents: One should allow line breaks if we expect people to manually edit these files. Given the specific set of modifications that users might want, I assume this will be the case. That being said, should we allow tags to be broken across 2 lines? That seems messy, but parsers wouldn't really care as we'd just mash everything together

acesnik commented 5 years ago

Despite the mess, I think we should allow tags to be broken across lines. Mashing together worked for all sorts of potatoes last week, so it should work for us here.