smith-chem-wisc / MetaMorpheus

Proteomics search software with integrated calibration, PTM discovery, bottom-up, top-down and LFQ capabilities
MIT License
90 stars 46 forks source link

Support additional input/output types #1124

Open rmillikin opened 6 years ago

rmillikin commented 6 years ago

Spectra inputs:

Database inputs:

Search results outputs:

Calibrated spectra outputs:

acesnik commented 6 years ago

RE: supporting .mzml.gz, I read somewhere a few years ago that compression introduces artifacts into RAW files. Does this also happen with mzMLs?

rmillikin commented 6 years ago

.mgf is an input now but I think it would also make a nice output. Several spectral annotators use it as a lightweight spectrum representation. e.g. http://spectrumviewer.org/upload.php

zrolfs commented 6 years ago

Do we want to output .mgfs, though? Users can simply use MSConvert for that. The only reason we output .mzML is for calibration and .mzML holds more information than .mgf for downstream analysis.

rmillikin commented 6 years ago

That's true, much easier

rmillikin commented 6 years ago

What features would you like to see in the html reports @acesnik ? Or anyone else?

acesnik commented 6 years ago

A few starter thoughts. Feel free to add to this!

acesnik commented 6 years ago

Here's a SnpEff HTML report that we could use as inspiration. SnpEffReport.zip

zrolfs commented 5 years ago

As a note on the spectral libraries, what if we just did a match between runs kind of thing? After each run, we could append novel identified peptides and their observed retention times to a written database. This would allow for the identification of unfragmented peptides in future runs and could dramatically improve sensitivity.

trishorts commented 4 years ago

maybe we should add .zip support?

zrolfs commented 4 years ago

maybe we should add .zip support?

Can you clarify? We already allow the option to output .zip files for the individual files (#1789 ). Are you suggesting zipping the entire results folder?

trishorts commented 4 years ago

I was thinking as an input type. I had forgot about the output issue. I don't have opinion on output so much. I think .gz if fine for that. But. lots of people use .zip and since xml is so large (especially after gptmd), maybe we can take zip for mat for xml and raws?

rmillikin commented 4 years ago

zipped protein database would be fine w/ me. @acesnik did mention that spectra files might sometimes have compression artifacts which is why we don't support compressed spectra files. not sure of the evidence on that though.