Open ChristineChichester opened 10 years ago
We can't use the IMS with _pageSize=all, as queries quickly become too large to process (due to the large number of mappings) and result in HTTP500: Internal Server Error.
The only way to fix this would be to load the ncbigene -> uniprot and cw -> uniprot linksets in the LDC ...but we generally avoid loading linksets in the LDC as we then can no longer use lenses...
So this is probably a "will not fix"
Just to clarify the issue: the pharmacology queries work up to approx. 10000 items, and also return CW identifiers. Are these not using the IMS? What would be the largest possible amount of data that could be returned with pageSize all? @NuriaQueralt, what is the highest count of targets/associations we would expect for a disease?
What I mean is that with the current LDA architecture the behaviour for _pageSize=all
needs to be consistent across all calls.
In the past, we found that if we pass the _pageSize=all
pharmacology query through the IMS, we often got queries that were too large to process.
In that case, we decided to keep OCRS->chembl and CW->chembl links in the LDC.
If we want to change it so _pageSize=all
behaves differently for different kinds of calls (for e.g. if the number of targets per disease turns out to be small enough to be passed through the IMS), we need to make fairly significant changes to the LDA codebase.
Ok, I wasn't aware that the pharmacology calls are the exception (and that there the mappings are in the LDC), I thought it is the other way round.
@danidi the highest count of genes for a disease in this version of DisGeNET (v2.1) is 5102 genes associated to the disease 'NEOPLASM MALIGNANT' (C0006826).
Adding the parameter _pageSize parameter "all" changes the results of Targets for Disease. Without the parameter setting UniProt data is also returned.
With Default (no parameter used): items: [ { _about: "http://identifiers.org/ncbigene/1000", inDataset: "http://rdf.imim.es/disgenet-void.ttl#gene", forDisease: { _about: "http://linkedlifedata.com/resource/umls/id/C0002395", inDataset: "http://rdf.imim.es/disgenet-void.ttl#disease", name: "Alzheimer Disease" }, seeAlso: [ { _about: "http://purl.uniprot.org/uniprot/P19022", inDataset: "http://purl.uniprot.org" }, { _about: "http://www.conceptwiki.org/concept/97eecd42-5ddd-437d-ab43-80dc7c5b2e50", inDataset: "http://www.conceptwiki.org", prefLabel_en: "Cadherin-2 (Homo sapiens)", prefLabel: "Cadherin-2 (Homo sapiens)" } ], closeMatch: { _about: "http://purl.uniprot.org/uniprot/P19022", inDataset: "http://purl.uniprot.org" }, relatedMatch: [ "http://purl.uniprot.org/uniprot/C9JMH2", "http://purl.uniprot.org/uniprot/A8MWK3", "http://purl.uniprot.org/uniprot/C9J8J8", { _about: "http://purl.uniprot.org/uniprot/P19022", inDataset: "http://purl.uniprot.org" }, "http://purl.uniprot.org/uniprot/C9J126" ] },
With _pageSize=all [ { _about: "http://identifiers.org/ncbigene/1000", inDataset: "http://rdf.imim.es/disgenet-void.ttl#gene", forDisease: { _about: "http://linkedlifedata.com/resource/umls/id/C0002395", inDataset: "http://rdf.imim.es/disgenet-void.ttl#disease", name: "Alzheimer Disease" } }, { _about: "http://identifiers.org/ncbigene/100188754", inDataset: "http://rdf.imim.es/disgenet-void.ttl#gene", forDisease: { _about: "http://linkedlifedata.com/resource/umls/id/C0002395", inDataset: "http://rdf.imim.es/disgenet-void.ttl#disease", name: "Alzheimer Disease" } },