gosling-lang / gos

A declarative interactive genomics visualization library for Python.
https://gosling-lang.github.io/gos
MIT License
218 stars 14 forks source link

Plotting Single Co-ordinate BED file on custom assembly #120

Closed Gr1m3y closed 1 year ago

Gr1m3y commented 2 years ago

Hi,

First, thanks for the great work! I am really liking the layout of Gosling so far!

I am having an issue plotting some data. I have a microbial assembly with two chromosomes and a tsv file with three columns: chromosome, position, value (the file does not have a header). I want to plot the values along each chromosome, so have code like this:

data = gos.csv(
    "../data/06_coverage/23b_coverage.bed",
    separator="\t",
    headerNames=["chrom", "position", "value"],
    chromosomeField="chrom",
    genomicFields=["position"]
)

gos.Track(data).mark_bar().encode(
    x=gos.X("position:G"),
    y=gos.Y("value:Q", axis="left"),
).view(
    assembly=[
        ("NC_012345.1", 1_234_567),
        ("NC_012346.1", 45000)
    ]
)

When I run the plot, I just get a blank canvas with the chromosome labels/positions along the top, but no data is showing up. I am not sure what I am missing here. Does the file have to have two coordinates for each feature?

Thanks in advance for the help! I hope this is not something obvious that I am missing.

Keep up the great work! Excited to see this tool continue to develop!

sehilyi commented 2 years ago

Hi @Gr1m3y, thank you for reaching out!

The code looks good to me. If you see the empty track, I think Gosling was not able to load data to render. One possibility is that the URL to the data is incorrect. You can confirm this, for example, by opening the Console tab of Chrome Developer Tools and finding related errors, e.g., GET https://localhost:8080/[filename].bed net::ERR_ABORTED 500.

Another possibility is that csv() does not properly parse the contents of the BED file due to incorrect parameters. For example, the following data I created is correctly displayed on Gos.

NC_012345.1 1   1
NC_012345.1 10000   10000
NC_012345.1 1000000 1000000
NC_012346.1 1   1
NC_012346.1 100 100
NC_012346.1 1000    1000
Screenshot 2022-11-30 at 09 03 15

Does the file have to have two coordinates for each feature?

Nope. Single coordinate should also work.

Let me know if you still cannot address the issue!

Gr1m3y commented 1 year ago

Hi @sehilyi

I still seem to be having issues. I am wondering if it has to do with the amount of data? Is there a practical limit on how many rows of data Gosling can handle? Beyond that, I am not certain what it could be. I did not see any errors in loading the data, and to verify, I tried to set everything up exactly how you have it. I will try to experiment with some smaller datasets to see what happens.

manzt commented 1 year ago

Hi, just going through old issues. We haven't done any benchmarking, but yes, very large tabular files may be challenging for Gosling to handle. There is a memory limit in the web browser (4GB), which if exceeded can cause issues. If you have any more information about the data I could try to help debug further.