Closed ypriverol closed 8 years ago
@ypriverol I have a doubt about the validity of the mzTab file provided for testing. As far as I understand the PSM_ID column, the only thing that my differ on lines having the same PSM_ID is the accession. But there are several lines, which have the same PSM_ID and different sequences in the current file. Is this valid? I can circumvent problems with this in the importer, but also could give a warning about the validity of the actual mzTab file.
@julian in the mztab specification we don't have anything saying that this is unique, actually I attached here a file with the table of the PSM example section.
In the PSM section:
A unique identifier for a PSM within the file. If a PSM can be matched to Description: multiple proteins, the same PSM should be represented on multiple rows with different accessions and the same PSM_ID.
Yes, but that does exactly state, what i would assume and is not fact in the test-file: Only the accessions should differ, nothing else.
Look at the lines with PSM_ID 923 for in the testfile: there are 6 lines, with the five sequences ILSILR, ILSLLR, LISLIR, LISLLR, LLSLLR. One sequence is double (with different accessions). I would assume, that there should be 6 lines with 5 different PSM_IDs instead.
@julianu I will check the example perhaps is from one of our old exporters. Will check with a new file and I will let you know.
@julianu I found the lines, I will check the issue here and the rationality behind this. Can you check the PRIDE XML import and the metadata.
Yes, will also work on the mzTab importer. I just need one additional check, if PSM_IDs are used in the way they are used in the testfile.
@julianu When we exported from PRIDE XML to mztab we used the spectrum ID as the id, because as you know PRIDE XML removed all the psms and keep for each spectrum only one peptide sequence, then the peptide is repeated for each protein.
The problem is basically that some that the I/L Peptides will reference the same spectrum. We can try to change our exporter. But the current validator works and that means we are accepting this case. Don't have a guess what is the best way to proceed.
@julianu here the current example I would like to merge: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2011/03/PRD000397
@julianu anything of this?
The metadata are nicely parsed and merged now, #17
@julianu here how we decided to export metadata from mzidentml -> mztab:
https://github.com/PRIDE-Utilities/ms-data-core-api/blob/master/src/main/java/uk/ac/ebi/pride/utilities/data/exporters/MzIdentMLMzTabConverter.java
and also from PRIDE XML to mztab:
We should check in the current version of the PIA how we converted PRIDE XML and mztab back to mzIdentML, especially the metadata. Some of the information cab be redundant like the softwares, etc.
Can you have a look