Open MaxThunderdome opened 9 months ago
@di-hardt just in case of notification issues
Thanks for reporting @MaxThunderdome As a quick solution: Set the review filter to "Both", download the results as CSV, open in Excel/Libreoffice and filter by is_swiss_prot/is_trembl column. It also might happen that all reported peptides are present in SwissProt & TrEMBL.
TL;DR The peptide count is not reported correctly on the initial search. The review filter is applied after streaming from the database and not in the DB-query itself. The filtered peptides are not subtracted from the initial db count which is ultimately rendered and used to calculate the pagination and paginated queries.
# body.ok.json
{
"precursor": 859.49506802369,
"lower_precursor_tolerance_ppm": 5,
"upper_precursor_tolerance_ppm": 5,
"variable_modification_maximum": 0,
"is_reviewed": true
}
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data "@body.ok.json" https://macpepdb.mpc.rub.de/api/peptides/search -o ok.json
grep -o sequence ok.json |wc -l
> 1020
jq '.count' ok.json
null # Expected as count was not requested
# body.err.json
{
"precursor": 859.49506802369,
"lower_precursor_tolerance_ppm": 5,
"upper_precursor_tolerance_ppm": 5,
"variable_modification_maximum": 0,
"is_reviewed": true,
"include_count": true
}
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data "@body.err.json" https://macpepdb.mpc.rub.de/api/peptides/search -o err.json
grep -o sequence err.json |wc -l
> 1020
jq '.count' err.json
> 89684
I'll work out a fix when I'm back in the office!
Thank you for the response. The site is very helpful. I am currently working around the issue by calling the API and filtering for Swiss-prot = true in python. It seem like the CSV are showing only a limited number of columns... mass, seq, and missed cleavages. Its not a huge issue. It just adds extra time... potentially a larger issue over many iterations. I appreciate your help!
Hey Max,
sorry took me a while. The issue is fixed in https://github.com/mpc-bioinformatics/macpepdb/commit/cf9f996dc49469f8bc0bb000f436069facf61178 and deployed.
As for your last comment: Do you use the key include_metadata: true
? That should include all columns in the CSV.
Here is an example:
{
"precursor": 1213.587355312,
"modifications": [],
"lower_precursor_tolerance_ppm": 5,
"upper_precursor_tolerance_ppm": 5,
"variable_modification_maximum": 0,
"is_reviewed": true,
"include_metadata": true
}
There is a documentation online: https://macpepdb.mpc.rub.de/docs/api#search-by-mass
Let me now if I can close the issue.
Hello I noticed that when you search for a peptide there are issues with the filter options. I tried to search for a particular mass and it comes up with 306 matches for example. This number does not change whether the option for both, swiss-prot, and theoretical database. I even double checked a few peptides and they were marked with x/check or check/x but they still appear in both searches. Im hoping to remove the theoretical options and only search for proteins found in the uniprot database.