lizzieinvancouver / egret

1 stars 0 forks source link

Cleaning coordinates #21

Open DeirdreLoughnan opened 3 weeks ago

DeirdreLoughnan commented 3 weeks ago

@lizzieinvancouver @christophe-rd Since this is turning into a more complex issue, I am starting a specific issue for it.

Can you give a quoted example from the methods or such?

As a few examples with seeds collected in urban areas:

  1. For Dehgan84, "seeds were obtained from Mr. Alan Shapiro of San Felasco Nursery, Gainesville FL in April 1983." This nursery still exists, so we could use their lat/long and assume it has not moved since the 80's.

  2. For li11, "Dispersal units....were collected...from coastal saline soils of Huanghuan City, Hebei Province of north China." Here I would take the central lat/long of the city.

  3. Meyer94 gives the trail/mountain name in Table 1, which also includes the county name. So these could be extracted.

I agree that we should have yet another column that flags these, noting something like 'lat long coarse scale'.

I can also start a txt called methodsCheck that lists out these things we need to remember during analyses.

DeirdreLoughnan commented 3 weeks ago

@christophe-rd Have you started taking notes anywhere on how Tolu cleaned the coordinates? I have added to the top of the "cleanCoordinates.R" file the url for the lat/long converter and instructions for how to extract the lat long from a region on google earth.

@buniwuuu We could use your help getting a rough estimate for the 117 papers that are missing lat/long information. Tolu has already started the process. As written at the top of the "cleanCoordinates.R" code, we use what information we have from the paper on the source population of the seeds to find the general region on Google Earth and then drop a pin to get the lat/long for the centre of that region. I include instructions on how to change the Google Earth settings so it gives values in decimal degrees. Values to two decimal places are sufficient.

What we need your help with:

Can you give us an update on the progress you make, including how many studies you still have to search, on Friday June 14th or earlier if you finish before then?

christophe-rd commented 3 weeks ago

@DeirdreLoughnan, no I didn't take notes on this. Tolu told me she didn't write up any notes on how she looked for the coordinates. Therefore, I didn't have anything to write... I edited her code to help visibility. Let me know if I can do something to help!

buniwuuu commented 3 weeks ago

@DeirdreLoughnan I am going to be in Hawaii for the next week...but I will try to do as many as I can and report back to you!

lizzieinvancouver commented 3 weeks ago

no I didn't take notes on this. Tolu told me she didn't write up any notes on how she looked for the coordinates. Therefore, I didn't have anything to write...

We could reach out to @toluam and ask for whatever notes she has or what she remembers (and document that she did).

lizzieinvancouver commented 3 weeks ago

As a few examples with seeds collected in urban areas:

  1. For Dehgan84, "seeds were obtained from Mr. Alan Shapiro of San Felasco Nursery, Gainesville FL in April 1983." This nursery still exists, so we could use their lat/long and assume it has not moved since the 80's.
  2. For li11, "Dispersal units....were collected...from coastal saline soils of Huanghuan City, Hebei Province of north China." Here I would take the central lat/long of the city.
  3. Meyer94 gives the trail/mountain name in Table 1, which also includes the county name. So these could be extracted.

@DeirdreLoughnan I agree with you that we can pull lat/lon for these (as you suggest), but I would suggest a new category in lat/long coarse (also, I would NOT include a backslash in a column name, it could cause issues someday -- better to do underscore or such) that is `nursery' because I think that is different and may happen a few times. Then we have: no entry, coarse and coarsenursery or such.

Thanks also to @christophe-rd for bringing this up and figuring out what to do and @buniwuuu .

DeirdreLoughnan commented 2 weeks ago

Thanks @lizzieinvancouver!

I agree that having an additional "nursery" category is good, I also recall seeing a few that are seed banks etc.

@christophe-rd could your follow up with Tolu and ask her to write up what she remembers doing? Point form notes would be fine.

Thanks @buniwuuu for working on this! Let us know what progress you make and how long it took! Once we are all back from vacation we can regroup and divide tasks amongst more people to make sure it can get done by early July!

christophe-rd commented 2 weeks ago

@lizzieinvancouver @DeirdreLoughnan @buniwuuu Of course! Here's what she told me:

I added the lat/long info for all the data I had access to at the time (pre-summer 2023), but we did a lot of scraping over the summer that I didn't work through cleaning. I also didn't officially write up my methods but I added a lot of comments to the cleaning code that you could use as a loose methodology.

I could reach out again to her, but I doubt she'll be able to give us more info...

buniwuuu commented 1 week ago

Hi @lizzieinvancouver @DeirdreLoughnan @christophe-rd, I finished locating the leftover papers! 30 papers didn't provide any location in the paper (listed in na.coords.id), and for some papers I had to look up specific maps online to find the location (rivers and creeks). I left comments for any paper I had to choose one location from multiple search results in Google Earth/look up on Google.

I will go through them again tomorrow to make sure they are all seed sources not experiment location.

buniwuuu commented 1 week ago

@DeirdreLoughnan for some papers, finer-scale locations were not scraped in the data, should I add a new column to correct them?

DeirdreLoughnan commented 1 week ago

@buniwuuu thanks for your help with this! It is great to hear only 30 papers have no location information.

Could you clarify what you mean by finer-scale locations? Do you mean the entries that are really vague, like Canada or Manitoba, or counties? Perhaps we could chat about this briefly after lab meeting today.

buniwuuu commented 1 week ago

@DeirdreLoughnan for example, one paper mentioned the town but only the province was scraped

DeirdreLoughnan commented 1 week ago

@buniwuuu thanks for catching that! We should be as specific as possible, so scraping the town (assuming the seeds were collected in the town) is what we want.