statisticalbiotechnology / maracluster

Matthew The's implementation of MaRaCluster
Apache License 2.0
11 stars 3 forks source link

Required attribute 'defaultDataProcessingRef' not present! #4

Closed BioComSoftware closed 7 years ago

BioComSoftware commented 7 years ago

Hello. I'm not sure if the Git issues are monitored. I'm not finding an online community or mailing list, so I thought I'd push this here.

I'm running some MGF files through maracluster, ending with either .ms2 files or .mzML files (depending on setting).

However, regardless of whether I use maracluster --mzML, or maracluster --ms2 and then convert the files to mzML with msconvert...I get the following error when trying to use the files:

/(...)/XMLHandler.cpp(103): While loading '/(...)/sample_consensus10.part1.mzML': Required attribute 'defaultDataProcessingRef' not present! Error: Unable to read file (- due to that error of type Parse Error in: /(...)/XMLHandler.cpp@104-void OpenMS::Internal::XMLHandler::fatalError(OpenMS::Internal::XMLHandler::ActionMode, const OpenMS::String&, OpenMS::UInt, OpenMS::UInt) const) Error occurred in line 151 of file /(...)/MzMLFile.cpp (in function: void OpenMS::MzMLFile::safeParse(const OpenMS::String&, OpenMS::Internal::XMLHandler*)) !

Is this due to formatting changes between the last maracluster release and the current mzML format?

Thanks!

MatthewThe commented 7 years ago

Thank you for reporting this, we are monitoring issues here.

I have not seen this error before, but a formatting change could indeed be responsible. Unfortunately, I cannot replicate this error on my local setup. Which version of msconvert are you running, and did you use the compiled binary of maracluster on the release page or did you compile it yourself?

BioComSoftware commented 7 years ago

Hi Matthew,

Thanks for the quick response.

I'm using the binary for Ubuntu 13.04 desktop: https://github.com/statisticalbiotechnology/maracluster/releases/download/rel-0-01/maracluster-v0-01-linux-amd64.deb

msconvert: ProteoWizard release: 3.0.10114 (2016-10-19) ProteoWizard MSData: 3.0.10112 (2016-10-19) ProteoWizard Analysis: 3.0.10112 (2016-10-19) Build date: Oct 20 2016 00:43:25

The issue is identical regardless of whether I run: # maracluster consensus -l ./clusters/sample.clusters_p10.tsv -f ./consensus -o ./consensus/sample_consensus_10.mzML

or

# maracluster consensus     -l ./clusters/sample.clusters_p10.tsv     -f ./consensus     -o ./consensus/sample_consensus_10.ms2 # msconvert ./consensus/sample_consensus_10.ms2 --mzML

MatthewThe commented 7 years ago

Thanks for the information.

I can indeed confirm that the mzML file created directly by MaRaCluster fails when loading into OpenMS. It indeed misses the "defaultDataProcessingRef" attribute. I will try to fix that now.

However, when I run the ms2 file through the most recent version of msconvert (2016-10-26) I do get a defaultDataProcessingRef attribute on the spectrumList tag in my mzML file, do you have that as well? I tried a few OpenMS tools on this mzML files and they did not report any errors. Could you tell me which OpenMS version and tool are you running?

BioComSoftware commented 7 years ago

Hi Matthew,

Thank again for the fast reply. I'm using openMS 2.1.0

So, I have tried using both msconvert to convert the ms2 files to mzml. I have also tried having maracluster output .mzML directly.

It may be important that *.MGF files are the INPUT to maracluster, so there is no 'defaultDataProcessingRef' in those MGF files. However, some form of 'defaultDataProcessingRef' is required in the final output'ted mzML files. Maybe thats the issue.

The process is this:

  1. An OpenMS pipeline creates MGF files. The last node producing the MGF files is a FileConverter node, that takes processed .mzML IN, and just dumps MGF out.
  2. Then there MGF files are run through maracluster as so:

    cd /to/the/mgf/files/dir

    ls | while read file; do echo "pwd/$file" >> ./LIST; done

    maracluster batch -b ./LIST -f ./clusters -a sample -t -10 -c -10

    maracluster consensus \

    -l ./clusters/sample.clusters_p10.tsv \ -f ./consensus \ -o ./consensus/sample_consensus_10.ms2

  3. Then these files WILL be passed into OpenMS:
  4. An OMSSAAdapter
  5. An xTandem Adapter
  6. An MSGFAdapter.

Unfortunately the OMSSAAdapter and xTandemAdapter only accept .mzML. They are the ones giving the errors after I convert the MGF back to .mzML

BioComSoftware commented 7 years ago

Hi Matthew,

I apologize. The .ms2 -> .mzML conversion by msconvert DOES, in fact, work!

However, the *.mzML created directly by maracluster does not have the defaultDataProcessingRef

MatthewThe commented 7 years ago

Great to hear!

I added the defaultDataProcessingRef in the mzML output now as well, together with some other fields that were less critical, but possibly useful anyway. I will create a new binary release for this.

The new release (v0.02) will also feature a new "-S"/"-splitMassChargeStates" flag for generating consensus spectra, which copies spectra with multiple precursor candidates as separate spectra with each their own (identical) peak list. This might be useful for you, as the OpenMS mzML parser seems to only allow a single precursor per spectrum.

BioComSoftware commented 7 years ago

Fantastic! Thanks!

MatthewThe mailto:notifications@github.com 1 November 2016 at 11:56

Great to hear!

I added the defaultDataProcessingRef in the mzML output now as well, together with some other fields that were less critical, but possibly useful anyway. I will create a new binary release for this.

The new release (v0.02) will also feature a new "-S"/"-splitMassChargeStates" flag for generating consensus spectra, which copies spectra with multiple precursor candidates as separate spectra with each their own (identical) peak list. This might be useful for you, as the OpenMS mzML parser seems to only allow a single precursor per spectrum.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/statisticalbiotechnology/maracluster/issues/4#issuecomment-257538888, or mute the thread https://github.com/notifications/unsubscribe-auth/AGX53mGqYo2ppCILY0Gfb-9R2lWG5sPqks5q5xrbgaJpZM4KlGBw.

Michael Rightmire

B.Sci. Molecular Biotechnology

MCSE,MCP+I,HPUXCA,CompTIA,VDE, SIAM,ISCB

Skype:RightmireM

+49-721-1320-2562 (DE)

+1-408-890-2121 (USA)

Biocom Biotechnology and Software BiocomSoftware.Com Http://biocomsoftware.com

BiocomSoftware.De http://Biocomsoftware.de //

The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure. If you are not the intended recipient of this message or their agent, or if this message has been addressed to you in error, please immediately alert the sender by reply email and then delete this message and any attachments. If you are not the intended recipient, you are hereby notified that any use, dissemination, copying, or storage of this message or its attachments is strictly prohibited.