Open julie-sullivan opened 3 years ago
cellbase_transcript_client.search(
biotype=self._relevant_biotypes,
include=self.CELLBASE_TRANSCRIPT_QUERY_INCLUDE,
assembly=self._assembly,
annotationFlags=InterpretationProcess.GENE_CODE_BASIC_TRANSCRIPT_SET,
sort='id',
)
if you run this script more than once, you will get different results.
For the above to work, you need an additional index. {"transcripts.biotype":1, id:1}
CellBase (for Transcripts ONLY) is sorting after the pagination. It must sort before the SKIP and LIMIT are being applied. If there is a replicaset present, then the query results will be incorrect.
In the database adapter I did this:
I also tried sort after the projection. The files were the same but they were truncated as the SORT failed:
You can opt in to external sorting: https://docs.mongodb.com/manual/reference/command/aggregate/#std-label-aggregate-cmd-allowDiskUse
Going to test this.