overture-stack / score

Secure Cloud Object REsource: file transfer microservice
https://www.overture.bio/products/score
GNU Affero General Public License v3.0
18 stars 10 forks source link

BUG - genomic slice of CRAM only gives me header #399

Closed javierAPC closed 6 days ago

javierAPC commented 2 months ago

Hi, i want to work with the CRAM files from 2 cohorts (PACA-CA and APGI-AU) on the ICGC ARGOS database, I need the genomic information of the HLA region so im trying to get said region via the view option, but for any region of any file it return only the header.

Description

I don't really know if is a bug or me dosent understanding how it works, but when i use the download option in one of them, the files is complete and i can work with it with no problem, yet trying to slice said files with the view option dosent work for the HLA region or any other, what i really want are the reads for that region, but since i only have the header when i pipe the file the fastq files are empty

Expected Behaviour

I expect a BAM /SAM file the header + the information of the alignment

Actual Behaviour

the files are only the header so when this ouput is piped the final results are empty

Possible Fix

Steps to Reproduce

  1. Go to 'score-client-5.10.0'
  2. type the comand bin/score-client view --object-id 56a28309-670e-5a7f-8392-eab71281a5cd --query 6:28510120-33480577 --reference-file /home/victor/ref-fasta/GRCh38_full_analysis_set_plus_decoy_hla.fa --output-format BAM | samtools fastq -1 test/out.R1.fastq -2 test/out.R2.fastq
  3. the output is empty for he fastaq files and only the header for the slice

Your Environment

edsu7 commented 2 weeks ago

Hi @javierAPC ,

Please reach out to our helpdesk for any inquiries. Issues there will be picked up quicker.

The location query is missing chr so try chr6:28510120-33480577 instead of 6:28510120-33480577

For example:

Running with chr yields count for header:

score-client view --object-id 56a28309-670e-5a7f-8392-eab71281a5cd --query 6:28510120-28510125 --reference-file /resources/GRCh38_hla_decoy_ebv.fa | wc -l

result:

Running...Viewing...
Validating repository connection...
3430

but with chr provides header + read:

score-client view --object-id 56a28309-670e-5a7f-8392-eab71281a5cd --query chr6:28510120-28510125 --reference-file /resources/GRCh38_hla_decoy_ebv.fa | wc -

results:

Running...Viewing...
Validating repository connection...
3778

Could you amend your query and give it a try?

Cheers, Edmund