igvteam / spacewalk

Spacewalk is an application for displaying and interacting with super-resolution chromatin tracing data in 3D. SpaceWalk includes igv.js and juicebox.js instances for rapid and intuitive visual comparison and interaction between 3D data and 1D genomic data.
MIT License
10 stars 1 forks source link

Bed/bedgraph file not recognized #254

Closed gnir closed 1 year ago

gnir commented 1 year ago

Hello team,

I tried uploading a bed file (Features exported from IGV) to go along with my cloud point data (in a separate file) and the bed file wasn't recognized. I converted the bed to bedgraph but it didn't help.

Please contact me if you need the spacewalk or bed files.

Thank you, Guy

jrobinso commented 1 year ago

@turner will probably have to answer, but does the file have extension ".bed"? Does it load in igv-web https://igv.org/app (use the "Tracks" menu).

turner commented 1 year ago

@gnir send me a link to the file and I'll take a look.

gnir commented 1 year ago

Hi guys,

Sorry for not responding earlier - I have to modify my github notifications. Here is a link to the bedgraph -https://www.dropbox.com/s/026qxtq943y7987/NC_016856.bedgraph?dl=0 The bed file was exported from IGV and then converted to a bedgraph.

jrobinso commented 1 year ago

The problem, I think, is with the chromosome names, e.g. "gi|378448274|ref|NC_016856.1|". That is probably not matching any sequence known to Spacewalk or IGV.

gnir commented 1 year ago

The bedfile was downloaded from IGV 2.16.0, which I installed on my laptop. Here is a snapshot taken from IGV. 2DAD0FCB-1DBD-49D6-883C-AF0731353818_4_5005_c

jrobinso commented 1 year ago

How did you download a file from IGV?

So that means you could load the file in IGV desktop if you select that genome assembly I don't think we have that assembly available in igv-webapp, and I don't know to be honest how you select a genome assembly in Spacewalk.

On Fri, Feb 10, 2023 at 8:10 AM Guy Nir @.***> wrote:

The bedfile was downloaded from IGV 2.16.0, which I installed on my laptop. Here is a snapshot taken from IGV. [image: 2DAD0FCB-1DBD-49D6-883C-AF0731353818_4_5005_c] https://user-images.githubusercontent.com/11670447/218139750-c06a46de-3dc1-48f6-943a-710f0b8248d5.jpeg

— Reply to this email directly, view it on GitHub https://github.com/igvteam/spacewalk/issues/254#issuecomment-1426028986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHD2HEDGVU5DSK5OS5XHSTWWZR57ANCNFSM6AAAAAAUVP4JBI . You are receiving this because you commented.Message ID: @.***>

gnir commented 1 year ago

Once in IGV, I loaded this genome (Salmonella enterica...) from the 'Hosted Genomes' menu. Once the track was visible on the IGV browser I then right-clicked the track and chose export features.

jrobinso commented 1 year ago

OK, that's a bit non-optimal, if you need the file I can send you a download link. I'm assuming this was the gene track?

The chromosome names need to match the names Spacewalk uses, which it gets (I assume, @turner can confirm) from the CNDB file.

gnir commented 1 year ago

My intention was to make a bed file compatible with Spacewalk, which is the gene track for this genome assembly. I thought that downloading such a track from IGV would be the way to go but I guess not?

jrobinso commented 1 year ago

There's no real relation between IGV desktop and Spacewalk, and many assemblies in desktop are 10 years old or more (don't know about this one). But in general IGV is a tool to view annotations, not obtain them. What are the chromosome names like in your CNDB file, if you know?

gnir commented 1 year ago

NC_016856.1 I'm now on NCBI trying to download the annotated genome and convert it to a spacewalk-friendly format (bedgraph?)

jrobinso commented 1 year ago

Yes I guess this assembly is a single sequence. The quick solution is to just replace the names (first column) of your .bed file with "NC_016856.1'. RE bed vs bedGraph, SW (actually igv.js) can load both, but they are different formats with different purposes. "bed" just defines an annotated region, "bedGraph" includes a signal value for creating bar chart type graphs.

jrobinso commented 1 year ago

There might also be an issue with igv.js not knowing about this assembly. I can add it, will do so sometime today.

gnir commented 1 year ago

Thank you for explaining.

I replaced the names column to be NC_016856.1, just like my sw file. However, I'm getting the same result. IGV on spacewalk is showing hg19. F416A09E-9896-4BE3-9667-50F603DE4C66

gnir commented 1 year ago

Also tried downloading the GFF and loading it to spacewalk but I'm having the same issue. 31385E12-17EB-4445-A30A-D351885D4CFD

jrobinso commented 1 year ago

Yes, as noted above this is never going to work with the hg19 assembly in igv.js. I just now added this assembly to igv-web (igv.js), but I don't know how SW triggers a genome switch.

BTW where did you find the gff file? The one I am using is 13 years old, it might have been updated.

gnir commented 1 year ago

If you check this NCBI webpage - https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000022165.1/ you will notice they have two assemblies. One from GenBank and is dated back to 2013. The other one from refSeq (and in fact from PGAP) is dated to May 2022.

So, I should wait to hear froM Doug? Thank you for your help, Guy

jrobinso commented 1 year ago

Yes, Doug is out for a few hours

gnir commented 1 year ago

Thank you! Guy

jrobinso commented 1 year ago

OK I'll update igv to the 2022 one.

turner commented 1 year ago

In spacewalk, the current genome is dictated by whatever is specified in the ensemble file. Once an ensemble is loaded, a corresponding genome object is created and that is broadcast to the rest of the app.

IGV track tables are updated to align with this genome. There is currently no mechanism to prevent a user from loading an igv track of a different genome.

jrobinso commented 1 year ago

@turner In this case SW didn't do that (switch the igv.js genome). Maybe igv.js didn't recognize it. In that case igv.js should probably not be available, and a warning message displayed.

Since this morning I have added the genome with id "NC_016856.1" to SW. @gnir could you try loading your trace file again in spacewalk and see if it switches, or if it remains on hg19.

turner commented 1 year ago

Ahh, one thing I just realized. All the CNDB stuff has been a total work in progress and not fully baked. I need to now find out where in the CNDB they keep meta data, they haven't told me anything so I have to go dig that stuff out myself.

turner commented 1 year ago

@jrobinso how did you add the genome id? In a branch? where?

gnir commented 1 year ago

@jrobinso - tried it, still loading hg19. Guy

turner commented 1 year ago

@gnir can you point me to the CNDB file you are using. I have no idea where they currently store genome or any other data beyond genomic position and spatial position. I assumed they would provide me with a current schema. They have not.

gnir commented 1 year ago

Just emailed it to your gmail.

turner commented 1 year ago

@gnir yah, this is a bug I need to get sorted. I'll let you know when we have a fix.

gnir commented 1 year ago

Thank you, I appreciate your help. Guy

turner commented 1 year ago

@gnir I have a potential fix for this issue. Here is a test app to try it out: https://deploy-preview-260--spacewalk-site.netlify.app/

I created a shareabe URL that you can use to demonstrate the fix: https://tinyurl.com/2mx9fpxr

A few observations. The SW file takes a long time to load. You have 1.4 million renderable 3D points, which accounts for the long load time. However, on screen I don't see anywhere near that many points. I suspect you have many 3D points with identical and very similar spatial locations that all cluster together appearing as a single larger point. Is that possible?

Anyway, let me know if this works for you.

gnir commented 1 year ago

Thanks @turner!

Some observations/comments:

  1. In the shareable URL, when I walk along the refseq, i don't see the corresponding position in the trace. However, it does work on the test app where I loaded the files myself.
  2. This SW file comes from Super-Resolution Tracing (SRT). These consist of many 3D points as each oligo out of the hundreds or even thousands that cover each genomic locus. These are all recorded and thus, the file is big. However, with Diffraction-limited Tracing (DLT), each genomic locus is an average of many detected oligos, and thus, each locus gets one 3D position. So, these SRT data have much more structural information, and are also more computationally-heavy. In addition, each fluorophore on each oligo can 'blink' several time. Therefore, many of these localizations will be highly similar (depending on the detection accuracy).

I hope this helps. Thank you for the fix! Guy

jrobinso commented 1 year ago

@gnir If you have a means to convert these to "cndb" files, and can index them, they should load faster.

gnir commented 1 year ago

Hi @jrobinso - I'm trying to read about the cndb format but all I can see in the spacewalk app is this.

jrobinso commented 1 year ago

@gnir I'm not sure who's developing that to be honest, @turner supports it but I do not know how you create them. Erez should be able to direct you to the right people.

gnir commented 1 year ago

Thank you, I'll reach out to Erez. Guy

turner commented 1 year ago

These fixes are now in the production app: https://spacewalk-site.netlify.app/

turner commented 1 year ago

@gnir here is the reason the app runs very laggy and all user interaction is slow and almost unusable. I took a look at your data the things in blue are identical and slowing the app: Screen Shot 2023-02-13 at 2 48 16 PM

All duplicate xyz points get renderer to the same location on screen. The user is unable to disambiguate them.

Cull the duplicates and app performance should improve.

gnir commented 1 year ago

Hmm... I need to check if I print the 3D positions with enough accuracy. Thanks for letting me know. Guy