levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

question about how cvParams are read #150

Closed colin-combe closed 2 months ago

colin-combe commented 2 months ago

Hi,

this file: https://ftp.pride.ebi.ac.uk/pride/data/archive/2020/10/PXD021417/XLpeplib_Beveridge_QEx-HFX_DSS_R3.mzid has cv Param's like: <cvParam accession="MS:1003024" cvRef="PSI-MS" name="OpenPepXL:score" value="0.435441904242644"/>

if element is a dict from pyteomics representing the containing SpectrumIdentificationItem -

element['OpenPepXL:score']
[0.435441904242644, 0.435441904242644]

why is the value a list containing the score twice? I don't see this behaviour with other score cvParams.

here is the whole SII element:

<SpectrumIdentificationItem passThreshold="1" rank="1" peptide_ref="PEP_15576937776419520329" calculatedMassToCharge="650.014416361171016" experimentalMassToCharge="650.683898925781023" chargeState="3" id="SII_17534158789279221875">
                    <PeptideEvidenceRef peptideEvidence_ref="PEV_15854021124151871431"/>
                    <Fragmentation>
                        <IonType charge="1" index="2">
                            <FragmentArray measure_ref="Measure_mz" values="217.082504272460938"/>
                            <FragmentArray measure_ref="Measure_int" values="0.046763770282269"/>
                            <userParam name="cross-link_chain" unitName="xsd:string" value="alpha"/>
                            <userParam name="cross-link_ioncategory" unitName="xsd:string" value="ci"/>
                            <cvParam accession="MS:1001224" cvRef="PSI-MS" name="frag: b ion"/>
                        </IonType>
                        <IonType charge="1" index="1 1">
                            <FragmentArray measure_ref="Measure_mz" values="147.113037109375 175.119247436523438"/>
                            <FragmentArray measure_ref="Measure_int" values="0.064021043479443 0.047833994030952"/>
                            <userParam name="cross-link_chain" unitName="xsd:string" value="alpha beta"/>
                            <userParam name="cross-link_ioncategory" unitName="xsd:string" value="ci ci"/>
                            <cvParam accession="MS:1001220" cvRef="PSI-MS" name="frag: y ion"/>
                        </IonType>
                        <IonType charge="2" index="8">
                            <FragmentArray measure_ref="Measure_mz" values="901.477661132812386"/>
                            <FragmentArray measure_ref="Measure_int" values="0.061802297830582"/>
                            <userParam name="cross-link_chain" unitName="xsd:string" value="alpha"/>
                            <userParam name="cross-link_ioncategory" unitName="xsd:string" value="xi"/>
                            <cvParam accession="MS:1001224" cvRef="PSI-MS" name="frag: b ion"/>
                        </IonType>
                        <IonType charge="2" index="2 5">
                            <FragmentArray measure_ref="Measure_mz" values="148.097000122070313 910.484375"/>
                            <FragmentArray measure_ref="Measure_int" values="0.092913068830967 0.157566979527473"/>
                            <userParam name="cross-link_chain" unitName="xsd:string" value="alpha beta"/>
                            <userParam name="cross-link_ioncategory" unitName="xsd:string" value="ci xi"/>
                            <cvParam accession="MS:1001220" cvRef="PSI-MS" name="frag: y ion"/>
                        </IonType>
                        <IonType charge="3" index="2 7">
                            <FragmentArray measure_ref="Measure_mz" values="488.281890869140625 578.32806396484375"/>
                            <FragmentArray measure_ref="Measure_int" values="0.02044477686286 0.016408979892731"/>
                            <userParam name="cross-link_chain" unitName="xsd:string" value="beta alpha"/>
                            <userParam name="cross-link_ioncategory" unitName="xsd:string" value="xi xi"/>
                            <cvParam accession="MS:1001220" cvRef="PSI-MS" name="frag: y ion"/>
                        </IonType>
                    </Fragmentation>
                    <cvParam accession="MS:1003024" cvRef="PSI-MS" name="OpenPepXL:score" value="0.435441904242644"/>
                    <cvParam accession="MS:1002511" cvRef="PSI-MS" name="cross-link spectrum identification item" value="13852174798197314617"/>
                    <userParam name="spectrum_index" unitName="xsd:integer" value="936"/>
                    <userParam name="xl_type" unitName="xsd:string" value="cross-link"/>
                    <userParam name="xl_term_spec_alpha" unitName="xsd:string" value="ANYWHERE"/>
                    <userParam name="xl_term_spec_beta" unitName="xsd:string" value="ANYWHERE"/>
                    <userParam name="isotope_error" unitName="xsd:integer" value="2"/>
                    <userParam name="precursor_mz_error_ppm" unitName="xsd:double" value="0.892654113124713"/>
                    <userParam name="OpenPepXL:score" unitName="xsd:double" value="0.435441904242644"/>
                    <userParam name="OpenPepXL:xquest_score" unitName="xsd:double" value="24.930034723401334"/>
                    <userParam name="OpenPepXL:xcorr xlink" unitName="xsd:double" value="0.036764705882353"/>
                    <userParam name="OpenPepXL:xcorr common" unitName="xsd:double" value="0.088888888888889"/>
                    <userParam name="OpenPepXL:match-odds" unitName="xsd:double" value="10.085589554670875"/>
                    <userParam name="OpenPepXL:intsum" unitName="xsd:double" value="1.558219123631716"/>
                    <userParam name="OpenPepXL:intsum_alpha" unitName="xsd:double" value="1.233005528136801"/>
                    <userParam name="OpenPepXL:intsum_beta" unitName="xsd:double" value="0.335336404976317"/>
                    <userParam name="OpenPepXL:total_current" unitName="xsd:double" value="13.86175970826298"/>
                    <userParam name="OpenPepXL:wTIC" unitName="xsd:double" value="0.083491570677781"/>
                    <userParam name="OpenPepXL:TIC" unitName="xsd:double" value="0.112411350104624"/>
                    <userParam name="OpenPepXL:prescore" unitName="xsd:double" value="0.0"/>
                    <userParam name="OpenPepXL:log_occupancy" unitName="xsd:double" value="13.123764215168332"/>
                    <userParam name="OpenPepXL:log_occupancy_alpha" unitName="xsd:double" value="13.39544594390558"/>
                    <userParam name="OpenPepXL:log_occupancy_beta" unitName="xsd:double" value="12.852082486431083"/>
                    <userParam name="matched_xlink_alpha" unitName="xsd:integer" value="2"/>
                    <userParam name="matched_xlink_beta" unitName="xsd:integer" value="3"/>
                    <userParam name="matched_linear_alpha" unitName="xsd:integer" value="4"/>
                    <userParam name="matched_linear_beta" unitName="xsd:integer" value="2"/>
                    <userParam name="ppm_error_abs_sum_linear_alpha" unitName="xsd:double" value="3.378405511379242"/>
                    <userParam name="ppm_error_abs_sum_linear_beta" unitName="xsd:double" value="5.882312119007111"/>
                    <userParam name="ppm_error_abs_sum_xlinks_alpha" unitName="xsd:double" value="10.418529272079468"/>
                    <userParam name="ppm_error_abs_sum_xlinks_beta" unitName="xsd:double" value="10.57919200261434"/>
                    <userParam name="ppm_error_abs_sum_linear" unitName="xsd:double" value="4.213041047255198"/>
                    <userParam name="ppm_error_abs_sum_xlinks" unitName="xsd:double" value="10.514926910400392"/>
                    <userParam name="ppm_error_abs_sum_alpha" unitName="xsd:double" value="5.725113431612651"/>
                    <userParam name="ppm_error_abs_sum_beta" unitName="xsd:double" value="8.700440049171448"/>
                    <userParam name="ppm_error_abs_sum" unitName="xsd:double" value="7.077534621412104"/>
                    <userParam name="precursor_total_intensity" unitName="xsd:double" value="1.04681134375e06"/>
                    <userParam name="precursor_target_intensity" unitName="xsd:double" value="1.04681134375e06"/>
                    <userParam name="precursor_signal_proportion" unitName="xsd:double" value="1.0"/>
                    <userParam name="precursor_target_peak_count" unitName="xsd:integer" value="2"/>
                    <userParam name="precursor_residual_peak_count" unitName="xsd:integer" value="0"/>
                    <userParam name="selected" unitName="xsd:string" value="false"/>
                    <userParam name="xl_pos1_protein" unitName="xsd:string" value="1336"/>
                    <userParam name="xl_pos2_protein" unitName="xsd:string" value="229"/>
                    <userParam name="xl_target_decoy_alpha" unitName="xsd:string" value="decoy"/>
                    <userParam name="xl_target_decoy_beta" unitName="xsd:string" value="decoy"/>
                    <userParam name="delta_score" unitName="xsd:double" value="0.940748793407122"/>
                    <userParam name="XFDR:is_intraprotein" unitName="xsd:string" value="false"/>
                    <userParam name="XFDR:is_interprotein" unitName="xsd:string" value="true"/>
                    <userParam name="OpenPepXL:id" unitName="xsd:string" value="DTNGLVKFK-EGNWKR-a6-b4"/>
                    <userParam name="XFDR:used_for_FDR" unitName="xsd:integer" value="1"/>
                    <userParam name="XFDR:fdr_type" unitName="xsd:string" value="q-value"/>
                    <userParam name="XFDR:FDR" unitName="xsd:double" value="0.903061224489796"/>
                    <cvParam accession="MS:1000894" cvRef="PSI-MS" name="retention time" value="2997.691380000000208" unitAccession="UO:0000010" unitCvRef="UO"/>
                </SpectrumIdentificationItem>

thanks, Colin

levitsky commented 2 months ago

I think this is because OpenPepXL:score is present twice in this excerpt, once as a cvParam and once as a userParam, so Pyteomics collects both values:

<cvParam accession="MS:1003024" cvRef="PSI-MS" name="OpenPepXL:score" value="0.435441904242644"/>
...
<userParam name="OpenPepXL:score" unitName="xsd:double" value="0.435441904242644"/>
colin-combe commented 2 months ago

i see, thanks