jupyterlab / jupyterlab-demo

Demonstrations of JupyterLab
https://mybinder.org/v2/gh/jupyterlab/jupyterlab-demo/master?urlpath=lab
Other
185 stars 240 forks source link

Dataset for Fasta demonstration. #37

Closed Zsailer closed 7 years ago

Zsailer commented 7 years ago

Here is a proposed fasta file for demonstration. It contains 110 Zika virus genomes assembled from samples across the Americas. It was published in Nature this year and is pretty relevant for a general audience.

I've also included a README.md file in the data folder with a short description and link to the paper.

A couple things to consider:

  1. It's a lot of data. I had to increase the default iopub_data_rate_limit for it to render in the notebook. If y'all think this is too large, let me know. I can look for a single protein or something similar.
  2. The sequences are nucleotide sequences (4 letters), not amino acid sequences (20 letters). That just means the visualization will be less colorful.
  3. On the other hand, this is a super cool dataset. They traced the evolution of the Zika virus through the America.
  4. It's also impressive that large visualization can be rendered and scroll so easily (without hanging).

@ellisonbg , @jasongrout , and @jzf2101 let me know what you think. I don't mind looking for other datasets as well!

jasongrout commented 7 years ago

https://github.com/jupyter/notebook/pull/2368 means that you won't have to bump the limit in notebook 5.1 - only stdout/stderr are considered for limiting in 5.1.

Zsailer commented 7 years ago

Oh nice! That takes care of the large file problem...

Now, we'll just need to decide if you want a fasta file that includes DNA sequences (only 4 colors) or protein sequences (20 colors). The current file contains DNA sequences.

jzf2101 commented 7 years ago

Closes #33

jzf2101 commented 7 years ago

Also the path to the fasta file should probably not be moved for the Scipy talk I'm assuming since it currently uses the path to thedata folder