igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
350 stars 52 forks source link

Warning: The index file is older than the data file #28

Closed MarkleLab closed 5 years ago

MarkleLab commented 5 years ago

Hi,

I finally got to setting up igvreports as per @jrobinso 's suggestion on the IGV Web App repo's issue page. I was able to set up the dependencies and launch and view the examples successfully.

I prepared a BED file with the sequence of searches that I would like to query, as per the formatting of example BED files in the repo. I do not get any warnings when I use the BED file with your example data.

When I use my own BAM file though, with hg38 reference link from example command, I get the following warning on every search in my BED file: [W::hts_idx_load2] The index file is older than the data file:PATH_TO_FILE/BH10281_1.bai

The files BH10281_1.bai and BH10281_1.bam as per 'Date Created' column were made at the same time. What exactly is this warning referring to? Could older mean an older reference (hg19)? I can download hg19 FASTA files and try with that if that's the case.

Best, MarkleLab

jrobinso commented 5 years ago

That's a warning about the bam file index, not the reference or fasta. It can be ignored if you're sure the .bai index is up-to-date. I don't know the hts library code for this, but it is probably expecting a creating date for the .bai file that is more recent than the bam file, maybe using "<" rather than "<=". In any event it can be ignored.

On Tue, Jul 2, 2019 at 5:02 PM MarkleLab notifications@github.com wrote:

Hi,

I finally got to setting up igvreports as per @jrobinso https://github.com/jrobinso 's suggestion on the IGV Web App repo's issue page. I was able to set up the dependencies and launch and view the examples successfully.

I prepared a BED file with the sequence of searches that I would like to query, as per the formatting of example BED files in the repo. I do not get any warnings when I use the BED file with your example data.

When I use my own BAM file though, with hg38 reference link from example command, I get the following warning on every search in my BED file: [W::hts_idx_load2] The index file is older than the data file:PATH_TO_FILE/BH10281_1.bai

The files BH10281_1.bai and BH10281_1.bam as per 'Date Created' column were made at the same time. What exactly is this warning referring to? Could older mean an older reference (hg19)? I can download hg19 FASTA files and try with that if that's the case.

Best, MarkleLab

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv-reports/issues/28?email_source=notifications&email_token=AAHD2HH3252JEVFICEKFYJLP5O67LA5CNFSM4H47YYT2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G47OMKA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHD2HHMUKBIWGUS5WWIKTLP5O67LANCNFSM4H47YYTQ .

MarkleLab commented 5 years ago

I am sure that the .bai index is up to date, so I will ignore the warning. Still, I can not visualize my BAM file.

Here's some information about my query: Genome: hg19

Search BED file: I had to change the extension to upload it on github, but this is the search file I'm using. searchMe.txt

Command: create_report ~/Desktop/searchMe.bed https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta --tracks PATH_TO_FILE/BH10281_1.bam examples/variants/refgene.sort.bed.gz --output igvjs_viewer.html

Errors: None. I don't get any errors in the console in Chrome Developer Tools.

Warnings: [W::hts_idx_load2] The index file is older than the data file:PATH_TO_FILE/BH10281_1.bai

Output Screen (IGV Reports):

Screen Shot 2019-07-05 at 10 33 06 AM

Output Screen (IGV Web):

Screen Shot 2019-07-05 at 10 36 39 AM

I don't see any alignments for my file using IGV Reports, even when I zoom out on each of the 7 queries. I also noted that the gene name is different in IGV reports than the (correct) one in IGV Web app.

If the warning can be ignored, what could be causing this issue?

Let me know if you need any more information.

Best, MarkleLab

jrobinso commented 5 years ago

OK, I'll look into this. I'm just back from 9 days vacation so am swamped. I'm sure we can make this work.

On Fri, Jul 5, 2019 at 7:41 AM MarkleLab notifications@github.com wrote:

I am sure that the .bai index is up to date, so I will ignore the warning. Still, I can not visualize my BAM file.

Here's some information about my query: Genome: hg19

Search BED file: I had to change the extension to upload it on github, but this is the search file I'm using. searchMe.txt https://github.com/igvteam/igv-reports/files/3363097/searchMe.txt

Command: create_report ~/Desktop/searchMe.bed https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta --tracks PATH_TO_FILE/BH10281_1.bam examples/variants/refgene.sort.bed.gz --output igvjs_viewer.html

Errors: None. I don't get any errors in the console in Chrome Developer Tools.

Warnings: [W::hts_idx_load2] The index file is older than the data file:PATH_TO_FILE/BH10281_1.bai

Output Screen (IGV Reports): [image: Screen Shot 2019-07-05 at 10 33 06 AM] https://user-images.githubusercontent.com/43764863/60729295-5b34f080-9f10-11e9-9020-ccdbacd60f20.png

Output Screen (IGV Web): [image: Screen Shot 2019-07-05 at 10 36 39 AM] https://user-images.githubusercontent.com/43764863/60729482-d39bb180-9f10-11e9-927f-5890a42c2d24.png

I don't see any alignments for my file using IGV Reports, even when I zoom out on each of the 7 queries. I also noted that the gene name is different in IGV reports than the (correct) one in IGV Web app.

If the warning can be ignored, what could be causing this issue?

Let me know if you need any more information.

Best, MarkleLab

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv-reports/issues/28?email_source=notifications&email_token=AAHD2HHDTEYVFQCUMV4HVJ3P55MPXA5CNFSM4H47YYT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZJV6CQ#issuecomment-508780298, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHD2HE3ZGWNDUAUS3FSQODP55MPXANCNFSM4H47YYTQ .

MarkleLab commented 5 years ago

Great, thanks a lot @jrobinso! Let us know if you have any questions!

jrobinso commented 5 years ago

Hi @MarkleLab, could you reinstall and try again? I think this is fixed. The pypi version number is 0.92.

WRT gene name, that is totally dependent on the track you load, in fact there is no concept of a gene in igv-reports only tracks. To get the same track as igv.js's hg19 reference use the following

https://s3.amazonaws.com/igv.org.genomes/hg19/refGene.sorted.txt.gz

so something like this for the report

create_report ~/Desktop/searchMe.bed https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta  --tracks <path to your bam> https://s3.amazonaws.com/igv.org.genomes/hg19/refGene.sorted.txt.gz --output test_viewer.html
MarkleLab commented 5 years ago

Hi @jrobinso,

Thanks for looking into this! I reinstalled and tried again with the newly updated repo, but I am still unable to visualize my own tracks.

I can however see the gene name now with the hg19 igv.js reference link you mentioned. Here's an updated screenshot:

Screen Shot 2019-07-07 at 5 35 58 PM

Just FYI, there weren't any errors in the console in Chrome Developer Tools and I can still see the same warning on my terminal window as before- [W::hts_idx_load2] The index file is older than the data file:PATH_TO_FILE/BH10281_1.bai

Let me know if you need more information or if there's anything I can do anything to help!

Best, MarkleLab .

jrobinso commented 5 years ago

That's strange, could you zip and attach the html file produced? You can email it to igvteam (at) broadinstitute.org if you don't want to attach it here.

FWIW this is the command line I tested

create_report test/searchMe.bed https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg19/hg19.fasta --tracks https://s3.amazonaws.com/1000genomes/data/HG00096/alignment/HG00096.alt_bwamem_GRCh38DH.20150718.GBR.low_coverage.cram https://s3.amazonaws.com/igv.org.genomes/hg19/refGene.sorted.txt.gz --output test_viewer.html
jrobinso commented 5 years ago

One thought, this could be a chromosome name problem. igv.js does automatic mapping of chrX <-> X, etc, for you buy pysam does not. If the chromosome naming convention of you bed file differs from the convention used in your BAM file this is exactly what you would see, because (for example) there are no alignments on "chrX", they are on "X".

MarkleLab commented 5 years ago

Yes, of course. Here's the zip with the HTML file output: test_viewer.html.zip

MarkleLab commented 5 years ago

And I tried the command your were working with and it works for me, but still no luck for my own data. I'm not sure if there is another naming convention in my BAM files, but I tried changing the chromosome from "chrX" to "X" like you mentioned in the BED file. That crashed the program and produced the following key error: KeyError: "sequence 'X' not present"

jrobinso commented 5 years ago

@MarkleLab OK, I looked at your report html. Your BAM file uses the following sequence names: 1,2,3,...,X,Y,MT. The fasta you are trying to use uses the other convention, chr1, chr2,...., chrX, chrY, chrM. This is the root of your problem. igv.js does some automatic chromosome aliasing for you, but igv-reports is built on pysam/samtools and these tools do not. It would be complex to add it, and no other command line tool in this space does this as far as I know. So the simplest fix is to use consistent sequence naming. Ideally you would use the same fasta for the reference as used to align the reads, but if this is not available you can use this one

https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/1kg_v37/human_g1k_v37_decoy.fasta

And this for the annotation track

https://s3.amazonaws.com/igv.org.genomes/hg19/refGene.sorted.b37.txt.gz

Finally change your search bed to use sequence names consistent with your files: 1,2,3,..., X,Y,MT.

MarkleLab commented 5 years ago

Hi @jrobinso,

Thanks a lot for looking into this!! By using the modified reference FASTA link you supplied, and changing my search query to use: 1,2,3,...,X,Y,MT, I can now visualize all my alignment files! 👍

However, now when I add the updated annotation track link you gave

https://s3.amazonaws.com/igv.org.genomes/hg19/refGene.sorted.b37.txt.gz

I get the following error:

Traceback (most recent call last):
  File "/Users/devteam/Documents/igvreports/venv/bin/create_report", line 10, in <module>
    sys.exit(main())
  File "/Users/devteam/Documents/igvreports/venv/lib/python3.7/site-packages/igv_reports/report.py", line 166, in main
    create_report(args)
  File "/Users/devteam/Documents/igvreports/venv/lib/python3.7/site-packages/igv_reports/report.py", line 81, in create_report
    trackObj = tracks.get_track_json_dict(track)
  File "/Users/devteam/Documents/igvreports/venv/lib/python3.7/site-packages/igv_reports/tracks.py", line 7, in get_track_json_dict
    type = get_track_type(format)
  File "/Users/devteam/Documents/igvreports/venv/lib/python3.7/site-packages/igv_reports/tracks.py", line 45, in get_track_type
    return dict[format]
KeyError: 'refgene'

Here's a current screenshot of my screen. I can see all my alignment tracks perfectly, but without a reference annotation track.

Screen Shot 2019-07-08 at 10 16 38 AM

Would it be possible to include the reference annotation track as well? Seeing the gene name would be very helpful to us.

jrobinso commented 5 years ago

Are you sure you are using version 0.92? You will see this error on earlier versions of igv-reports.

MarkleLab commented 5 years ago

Ah, my bad. My work computer had an outdated virtual environment. This is fixed now. I can visualize all my alignments as well as the reference sequence with version 0.92. :tada:

Thanks a lot for your help in resolving this issue @jrobinso! We really appreciate it. I'll go ahead and close the issue now.

Best, MarkleLab