ropensci / allodb

An R package for biomass estimation at extratropical forest plots.
https://docs.ropensci.org/allodb/
GNU General Public License v3.0
36 stars 11 forks source link

Reference table #57

Closed gonzalezeb closed 6 years ago

gonzalezeb commented 6 years ago

I cannot find the reference table, I see the reference_metadata in csv_database. I though I kept that table there, where I would feed the reference data (citation) for each equation.

But I just realized a big mistake. In the equations table we have a 'ref_id' but I never gave an id to each reference (each row) in the reference table (now lost).

I think I need help from @maurolepore

maurolepore commented 6 years ago

@gonzalezeb, can you find the last commit where the table you need is still there?

https://github.com/forestgeo/allodb/commits/master

image

--

You could do a manual binary search.

Example: Take this 9 points in history: 1, 2, 3, 4, 5, 6, 7, 8, 9

gonzalezeb commented 6 years ago

The second commit below

image

gonzalezeb commented 6 years ago

commit 3a8d5b8c711fd602ba807fe69a001eb489b5a7f2

maurolepore commented 6 years ago

This is how I rescued references.csv:

git checkout 3a8d5b8c711fd602ba807fe69a001eb489b5a7f2

Manually copy the file into another folder

git checkout master

Manually paste the file into data-raw/

maurolepore commented 6 years ago

Now we are still left with the problem of linking id to references, right?

Can you please find a commit where there is a master table that contains the references and equation_allometry? I think I could match equation_allometry to try find the id.

gonzalezeb commented 6 years ago

This commit f8962bc4ce00c625fe20183c93184a1f5904240c seems to be the last where the master table and equation_allometry where together..

But the real problem is that I never gave a ref_id. I think my idea was to use something like: first 4 letter of lastname then year, i.e. Clar_1985 (I prefer this way to numbers). Maybe I have to do that by hand?

maurolepore commented 6 years ago

OK, thanks! I'll have a look tomorrow.

maurolepore commented 6 years ago

@gonzalezeb, I think I got something useful for you to tweak and finalize the references table.

I added the file data-raw/data_references_id.csv which contains the new column author_year following your suggestion. I have not overwritten ref_id because it is not entirely NAs. Please check that problem, and merge the two columns ref_id and author_year into one.

What will happen if two equations have the same combination of author and year? Is kind of working for now but I'm not 100% sure its safe in the long run. Also, there are 32 references in data_references.csv but 35 in data_references_id.csv: Please see what's going on.

Once you are done you may want to move the clean table to data-raw/csv_database/ and continue editing it. Also it'd be nice to tidy data-raw/ by removing the leftover master and references tables.

image

gonzalezeb commented 6 years ago

To your question: What will happen if two equations have the same combination of author and year? Let's use Krista's system on a previous pub as citation or reference ID:

Citation ID in the form [last name of first author][year][first letter of first four words of title, when applicable].

The final reference table should have the following columns:

ref.id ref.doi ref.author ref.year ref.title ref.journal
maurolepore commented 6 years ago

@gonzalezeb

I updated data-raw/data_references_id.csv to reflect your request. You'll need to check the data, remove ref_id, then rename refid as ref_id.

Let me know your questions.

image

maurolepore commented 6 years ago

@gonzalezeb, the problem is that this new id doesn't help in linking the equations table with the references table -- which is what we wanted in the first place. That is because the master table we recovered has no information about reference title. It only has biomass_equation_source which contains author and year.

I think I could use author_year once, as I was doing before, to first link the two tables, i.e. references and equation. Then create a reference id formatted as you requested and use that format from then on.

maurolepore commented 6 years ago

@gonzalezeb, I believe this closes this issue: https://github.com/forestgeo/allodb/blob/master/inst/issues/57.md

image