Open AshleyWoods opened 6 years ago
hey @AshleyWoods here is a small set of polishing things todo on the app:
.csv
on the end of its file name.?occ_download_import
occ_get
which provides a minimal set of column ids to return. A small set is a good idea so that datasets don't get too huge.rgbif
documentation to the app on how to specify your GBIF username and password in your computer's enviornmental variable - this will be better for user security. See ?occurrence_download
for instructions on this. On a broader scope we need to determine where we want to go next with this work. We have discussed a few different options here that I've listed below:
I'm down to just moving the specifications of the columns over to a .csv file and changing the name of the repo(on the list of small things). I am unsure of how to change the name of the repo as when I attempted it the app broke because it no longer had the right path to any of its files. I'm not sure I did it correctly though.
I am actually unsure of how to move column specification over to a .csv file either. However, I have picked the "broader scope" direction I'd like to pursue. I think tools for downsampling would be a good thing to have.
Hey @AshleyWoods were you able to download the csv file of the field names I created? Essentially my idea was that you would simply add columns to that csv file for different sets of default sets of fields. A given set would specify which fields to include as those fields with 1's rather than 0's. Also a useful field to add to this csv file would be metadata describing what each field represents. This has to exist somewhere on the GBIF website or you should email them to find out where these are specified. Can you please describe exactly what the problem is with the csv file approach?
No worries on renaming the repo. I will do that with you when I get back to town. With regards to breaking the app or not - have we discussed how to use branches in git? Branches can be very helpful for giving you the freedom you need to make changes to the code but still ensure the app is working on the master branch. Many repos maintain a master and dev branch for this purpose, slowly merging dev to master when they are sure features are working.
Very cool on your interest to try to tackle geographic sampling bias. Please start combing the literature for this topic in the field of species-distribution modeling where I think the most has been written. I'll also try to send you a few key papers on this topic. Then we can start to brainstorm how to best do this.
That approach to the csv makes much more sense than what I was trying to do. Thank you! And as for the down-sampling, I found this package: https://www.rdocumentation.org/packages/caret/versions/6.0-80/topics/downSample I think it would be good to include this as a "quick down-sample" option and have a second place for input where people could specify grid cell size and minimum occurrences as more of a custom option.
The nature of that function I think is kind of what we want but we need something to apply to a landscape of coordinates.
Here are a few papers, I'll update this list as I encounter more - please move these to the wiki in time: Concepts papers
Methodological papers
I've added them to the wiki and I'll read over them. How do you create a branch? I'm trying to implement the .csv code and I don't want to break anything.
Nevermind, I believe I just found it.
I finished implementing it, but I am unsure of how to merge the branch back into the master branch.
So the ideal way this would be done is for you to push that branch to GitHub and then us GitHubs pull request mechanism to provide a public review of your merge. This gives me an opportunity to provide code review. This may have been such a small change you don’t want a review so in that case you will just merge the branch into master locally and then push master to github.
Pull request option:
While on your branch
git push origin my_new_branch
Then go to github and click on your new branch there should be a pull request option then
Local merge option:
git checkout master git merge my_new_branch git push origin master
Dan
On Jun 27, 2018, at 11:08 PM, AshleyWoods notifications@github.com wrote:
I finished implementing it, but I am unsure of how to merge the branch back into the master branch.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
One thing to note about the pull request option described above is that if you take this approach and we merge your new_branch into the remote master branch that your local master branch will be out of sync with the remote branch. You can resync them by doing the following:
git pull origin master
For completeness a work flow that includes forking and pull requests (the typical collaborative setup for using github) can be carried out this way (these instructions are for the repo MoBiodiv/mobr
but they can be generalized to any repo):
1) Fork the repo to your local GitHub account
2) Clone your forked version of the repo to your machine
git clone git@github.com:your_user_name/mobr.git
3) Link your local repo back to the master on MoBiodiv
git remote add upstream git@github.com:MoBiodiv/mobr.git
4) Create a branch for your changes
git branch new_function
5) Checkout your branch
git checkout new_function
6) Make your commits on that branch and when you are done push it to your forked copy of the repo
git push origin new_function
7) Submit a pull request on the GitHub website by going to your forked copy of the repo and clicking on the pull request button
8) After your changes are merged with master you'll want to merge that update to master with your copies as well.
git pull upstream master
git push origin master
# delete your branch as its no longer needed
git branch -d new_function
Before your start work on the project in the future you'll want to repeat step 8 so that your version of the repo does not become out-of-sync with the main repository.
I was thinking more about our vision for moving the app forward and it does seem like the literature has quite a bit of methods described for how to detect bias in the datasets. Maybe the simplest first step is to provide the user tools to detect bias in the chunk of data they extracted. Then the burden can be on them to decide what to do about it. Let me know what you think about this idea.
I like that idea. It also allows us to avoid altering the data in a way that the user doesn't want or like. (like you mentioned with the p values some programs give)
Oops, didn't mean to close the issue.
I am having issues finding more literature on the subject of correcting bias (the papers we have linked seem to have the most common/useful methods) and cannot seem to find any at all for detecting bias. All that comes up are papers that say "we need an easy way to detect bias" but offer no real solution.
Now that the file download is running all the functionality discussed in our last meeting has been implemented. That leaves the question of what to do next.
I also wanted to ask about the choice of default columns. I picked the ones I thought may be the most useful, but I am unsure if I added ones that should be left off or left off ones that should be added. Could you look at the list of columns and the list of ones I picked to be the defaults? (They're all typed out in the code and I REALLY don't want to have to type them all again.) There are 235 columns total and I have picked out 29.