Closed devonorourke closed 4 years ago
thanks @devonorourke for the report - i'll have a look
so we've run into this before - and there is documentation for it - see https://github.com/ropensci/bold/blob/master/R/bold_seqspec.R#L21-L28
Something isn't quite right on their end and you get back markers you don't ask for. So the only optio is to filter by markercode after you get the data back.
Does that sort this out?
Why build all these columns if you can't query every one of them?
I ended up just downloading the entire dataset and filtering after the fact like you suggested.
So, it sorts it out on your end, but it doesn't help me any :) ha
Why build all these columns if you can't query every one of them?
what does that mean? does it mean you want the function to filter the data inside the fxn?
The BOLD team are unresponsive to my contacts about their services so we can't change anything on their end to make e.g, marker queries actually work.
This is 100% a comment about Barcode of Life setup, 0% about your R package. What I'm saying is it's weird to me that you can download their specimen information that has something like 70 columns, yet you can only filter a handful of these, right?
What would be great is to be able to apply a filtering function for any of these fields, and not be restricted to just geo
or marker
etc (I think there are just 7 we can use from their online URL generator).
What if I wanted to be more specific than a country and search by lat/long? Or by date uploaded rather than just by institution? If the data is already in their database, I'm just wondering why it isn't set up on their end to leverage that additional information.
All it means on my end is needing to further filter after downloading the entire dataset, so no big deal, just a big file.
Modifying from the example on this repo, let's pull a tiny dataset of two Arthropod classes and see if we can extract just our
COI-5P
sequences (and filter out the non COI-5P markers):So far so good, everything present. I'm going to select just two small non-Insect arthropod classes from that
x.checks
list:Now let's apply that tiny list with
bold_seqspec
(notbold_seq
like in Readme):Pro: works great! Note, however, that there are several non-COI records in the
markercode
column. I didn't filter these out in the above argument, so that's okay!Maybe we can filter these out by passing the
marker="COI-5P"
argument within the lapply function?Crud, that didn't work.
I think this is because there are a pair of
markercode
columns:marker_codes
andmarkercode
.I can filter these things after the fact with something like:
Where am I going wrong? Thanks Scott!