thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
581 stars 64 forks source link

Unable to import tutorial data #53

Closed mschecht closed 3 years ago

mschecht commented 3 years ago

Hi @thackl,

Thank you so much for developing this package! There is a HUGE need for open-source solutions to visualize genomic loci I am super excited to leverage this tool :)

I was trying your tutorial and unfortunately, I am stuck at the first step:

# parse sequence length and some metadata from fasta file
emale_seqs <- read_fai("emales.fna") %>%
  extract(seq_desc, into = c("emale_type", "is_typespecies"), "=(\\S+) \\S+=(\\S+)",
    remove=F, convert=T) %>%
  arrange(emale_type, length)

I tried following your "raw data" link but the link is broken. I was however able to find emales.fna in your repo here: data-raw/emales/emales.fna after unzipping emales.tgz. Is this the correct file to start the tutorial?

When I use the file data-raw/emales/emales.fna this is the error I get:

> # parse sequence length and some metadata from fasta file
> emale_seqs <- read_fai("data-raw/emales/emales.fna") %>%
  extract(seq_desc, into = c("emale_type", "is_typespecies"), "=(\\S+) \\S+=(\\S+)",
          remove=F, convert=T) %>%
  arrange(emale_type, length)

Error: arrange() failed at implicit mutate() step. 
* Problem with `mutate()` input `..2`.
x Input `..2` must be a vector, not a primitive function.
ℹ Input `..2` is `length`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two. 
2: Too many col_names, ignoring extra ones

Thanks again for making this open-source package!

Cheers, Matt

thackl commented 3 years ago

Hi Matt,

glad you think this will be a useful package. And sorry about the issues with the show-case. I've pushed a few major updates over the last weeks and haven't found the time to update the article. I was hoping to have a new tutorial up within a week or so.

The raw data files now live in inst/extdata/, which after package install should become just extdata/. The easiest way to use raw examples files now is via ex("emales/emales.fna")... Using ex() should guarantee that you always get the right path.

There are also a few other breaking changes you might run into: for starters, I now always abbreviate "feature" with just "feat", because, well, I'm lazy ;)

For now, I can only suggest that you either wait a few days for the new tutorial or work off the function examples in the reference - those examples are all up to date. I, particularly, recommend having a look at the new family of convenience read-functions if you want to learn how to best import data from files.

Hope that helps, and feel free to report any other issues you encounter. Feedback is highly appreciated! Cheers Thomas

thackl commented 3 years ago

OK, it's just a start, but it covers some of the basics: https://thackl.github.io/gggenomes/articles/gggenomes.html