inbo / fish-tracking

🐟 Collection of scripts for processing and analysing fish tracking data
3 stars 0 forks source link

Verify script mapping of historical data #44

Closed peterdesmet closed 8 years ago

peterdesmet commented 8 years ago

Hi @PieterjanVerhelst, I noticed that 1) you've re-uploaded the historical data to 1. Raw and 2) that the manual mapping of station names I did for that file is no longer in 2. Verified. I still have it on my computer though.

If I understand correctly, we want to do all mapping with a script (which is good, as it can be reproduced), but I want to make sure that the removal of the file is on purpose. I remember that not all mapping in that file was straightforward.

PieterjanVerhelst commented 8 years ago

Indeed, I downloaded the historical data from the verified to delete the records with the wrong date. However, when uploaded again in verified, the size was only half of what it should have been (70mb instead of 150mb). Therefore, I uploaded the original historical data again in the raw folder. Unfortunately, the manual mapping was done again.

bartaelterman commented 8 years ago

Yes, this was a direct consequence from #43

peterdesmet commented 8 years ago

The fact that file size is about half is because I removed a lot of unused columns in my manual mapping. The number of records is the same.

I'll check in Google Refine all the mapping steps I've done and see if they are reflected in https://github.com/LifeWatchINBO/fish-tracking/blob/master/data/station_names.csv. That way, we can easily repeat the mapping process if necessary.

peterdesmet commented 8 years ago

On second thought, I'll wait till @bartaelterman has run his script on the historical data and I'll verify if it is the same as my manual mapping.

peterdesmet commented 8 years ago

We've just compared the manual mapping with the scripted mapping on the 2+ million historical records and only 2 receivers have a different new station name! :dancers: :dancers: Great there are only so few!

That means that the scripted mapping works as it should! The 2 differences are due to a missing/incorrect value in the mapping file or incorrect manual mapping:

1: VR2W-113521

Braakman4 (no space) is mapped to bh-32 by script and bh-33 manually. Since Braakman 4 (with space) is mapped to bh-32, I think the script is correct. All records are from receiver VR2W-113521

2: VR2W-120883

8 ISO 8s 6 is mapped to ws-6 by script and ws-30 manually. In the Google Spreadsheet, I also read ws-6, so I think the script is correct again. All records are from receiver VR2W-120883

@PieterjanVerhelst can you verify?

PieterjanVerhelst commented 8 years ago

Braakman4 (without space) is actually Braakman 5 and gets the new station name bh-33. Braakman 4 (with space) are normally from receiver VR2W-112298. 8 ISO 8s 6 is indeed ws-6.

bartaelterman commented 8 years ago

Since both Braakman4 (no space) and Braakman 4 (with space) are in the mapping file separately, the mapping of Braakman4 can easily be fixed by updating the corresponding line in the mapping file

PieterjanVerhelst commented 8 years ago

I have changed bh-32 to bh-33 for Braakman4 in the mapping file.

peterdesmet commented 8 years ago

Great. Closing issue.