Closed fgvieira closed 3 years ago
@fgvieira I actually do not know, I suspect pysam is trying to access the file based on information in the header. If so this is out of my control. Do you have, or can you create, a "bam" file version of the cram to see if the error persists?
I found this in the samtools reference (pysam, used by igv-reports, wraps samtools)
When reading a CRAM the @SQ headers are interrogated to identify the reference sequence MD5sum (M5: tag) and the local reference sequence filename (UR: tag). Note that http:// and ftp:// based URLs in the UR: field are not used, but local fasta filenames (with or without file://) can be used.
To create a CRAM the @SQ headers will also be read to identify the reference sequences, but M5: and UR: tags may not be present. In this case the -T and -t options of samtools view may be used to specify the fasta or fasta.fai filenames respectively (provided the .fasta.fai file is also backed up by a .fasta file).
I cannot reproduce the issue with my test CRAM files, but I suspect there is some specific combination of headers that cause samtools to try to compute a MD5Sum from the reference specified in the header. I think the fix will be to pass the reference specified in igv-reports to pysam with the "-T" argument. I am actually on vacation now until July 12 and will not try a release until I return, but I have pushed this (potential) fix. If you are able to build from source and try it please report back here. Feel free to ping this issue after July 12 as a reminder to release.
I tried the latest commit, but got an error:
Traceback (most recent call last):
File "bin/create_report", line 8, in <module>
sys.exit(main())
File "lib/python3.9/site-packages/igv_reports/report.py", line 271, in main
create_report(args)
File "lib/python3.9/site-packages/igv_reports/report.py", line 51, in create_report
reader = utils.getreader(track, None, args.fasta)
File "lib/python3.9/site-packages/igv_reports/utils.py", line 11, in getreader
return bam.BamReader(path)
TypeError: __init__() missing 1 required positional argument: 'fasta'
I tried adding the --fasta
command line option, but it is not recognized.
OK I'll have to look at this next week, I'm traveling now. There is no. "--fasta" option in igv-reports, that is a pysam option, sorry if I wasn't clear.
Many people use this with CRAM files, I don't know what it is about yours that is triggering this. However, the error is coming from pysam, a python package. I suggest installing that package and trying to query the file for alignments, knowing if that is successful or not would be good information.
Looking a bit more into the code, I think the issue might be here: https://github.com/igvteam/igv-reports/blob/a0c558a8a16d7168ca550ff0d1816c000af3f8ff/igv_reports/feature.py#L246-L247
@fgvieira Sorry for the delay, could you pull and try again when you have a chance?
You were on the right track, but its a little more subtle. For CRAM format we need to distinguish between input and output formats (CRAM is output as BAM in the report).
Seems to be working now. :+1: thanks!
Dear all,
I am trying to create an IGV report:
But I am getting the error below:
What does this error mean? If I try access the URL, it works fine. And why is IGV trying to access ENA if I provide the reference genome?
thanks,