mpc-bioinformatics / macpepdb-web

0 stars 0 forks source link

Peptide Search #4

Open MaxThunderdome opened 9 months ago

MaxThunderdome commented 9 months ago

Hello I noticed that when you search for a peptide there are issues with the filter options. I tried to search for a particular mass and it comes up with 306 matches for example. This number does not change whether the option for both, swiss-prot, and theoretical database. I even double checked a few peptides and they were marked with x/check or check/x but they still appear in both searches. Im hoping to remove the theoretical options and only search for proteins found in the uniprot database.

MaxThunderdome commented 9 months ago

@di-hardt just in case of notification issues

di-hardt commented 9 months ago

Thanks for reporting @MaxThunderdome As a quick solution: Set the review filter to "Both", download the results as CSV, open in Excel/Libreoffice and filter by is_swiss_prot/is_trembl column. It also might happen that all reported peptides are present in SwissProt & TrEMBL.

TL;DR The peptide count is not reported correctly on the initial search. The review filter is applied after streaming from the database and not in the DB-query itself. The filtered peptides are not subtracted from the initial db count which is ultimately rendered and used to calculate the pagination and paginated queries.

# body.ok.json
{
    "precursor": 859.49506802369,
    "lower_precursor_tolerance_ppm": 5,
    "upper_precursor_tolerance_ppm": 5,
    "variable_modification_maximum": 0,
    "is_reviewed": true
}
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data "@body.ok.json" https://macpepdb.mpc.rub.de/api/peptides/search -o ok.json
grep -o sequence ok.json |wc -l
> 1020
jq '.count' ok.json
null        # Expected as count was not requested
# body.err.json
{
    "precursor": 859.49506802369,
    "lower_precursor_tolerance_ppm": 5,
    "upper_precursor_tolerance_ppm": 5,
    "variable_modification_maximum": 0,
    "is_reviewed": true,
    "include_count": true
}
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data "@body.err.json" https://macpepdb.mpc.rub.de/api/peptides/search -o err.json
grep -o sequence err.json |wc -l
> 1020
jq '.count' err.json
> 89684

I'll work out a fix when I'm back in the office!

MaxThunderdome commented 9 months ago

Thank you for the response. The site is very helpful. I am currently working around the issue by calling the API and filtering for Swiss-prot = true in python. It seem like the CSV are showing only a limited number of columns... mass, seq, and missed cleavages. Its not a huge issue. It just adds extra time... potentially a larger issue over many iterations. I appreciate your help!

di-hardt commented 8 months ago

Hey Max,

sorry took me a while. The issue is fixed in https://github.com/mpc-bioinformatics/macpepdb/commit/cf9f996dc49469f8bc0bb000f436069facf61178 and deployed.

As for your last comment: Do you use the key include_metadata: true? That should include all columns in the CSV. Here is an example:

{
    "precursor": 1213.587355312,
    "modifications": [],
    "lower_precursor_tolerance_ppm": 5,
    "upper_precursor_tolerance_ppm": 5,
    "variable_modification_maximum": 0,
    "is_reviewed": true,
    "include_metadata": true
}

There is a documentation online: https://macpepdb.mpc.rub.de/docs/api#search-by-mass

Let me now if I can close the issue.