mhpob / otndo

A package to understand OTN data
https://otndo.obrien.page/
Creative Commons Attribution 4.0 International
3 stars 1 forks source link

PI / summary merge can break down #11

Closed jdpye closed 4 months ago

jdpye commented 4 months ago

Got a fun error when processing all the OTN NSBS data (open, attached)

> otndo::make_tag_push_summary('nsbs_matched_detections_all.csv')
ℹ Asking OTN GeoServer for project information...
ℹ Writing report...

processing file: make_tag_push_summary.qmd
  |.....................                     |  49% [station-summary-table]    
Quitting from lines  at lines 309-356 [station-summary-table] (make_tag_push_summary.qmd)
Error in `vecseq()`:
! Join results in 1729 rows; more than 1464 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
Backtrace:
 1. base::merge(station_summary, pis[, .(detectedby, PI)], by = "detectedby")
 2. data.table::merge.data.table(...)
 4. data.table::`[.data.table`(...)
 5. data.table:::vecseq(...)

Execution halted

Looks like there are more results in the PI/station_summary aggregation than there are in the data file. Could this be a problem with what our assumptions about uniqueness are for station names within a project, or is there another issue for us to chase down? Willing to bet it's a source data foible.

Source file was too big to attach, i'll ship it to you via Slack.

mhpob commented 4 months ago

HFX project had two instances of PI metadata: one before and one after addition of a PI. This caused the one-to-many match error.

                                                                                                           contact_pi detectedby
                                                                                                               <char>     <char>
1:                                       Dave Hebert (david.hebert@dfo-mpo.gc.ca), Fred Whoriskey (fwhoriskey@dal.ca)        HFX
2: Dave Hebert (david.hebert@dfo-mpo.gc.ca), Robert Lennox (robert.lennox@dal.ca), Fred Whoriskey (fwhoriskey@dal.ca)        HFX

The fix checks if multiple instances of PI information exist per project and, if so, combines those instances, extracts unique character elements, and replaces the PI/POC/PI_emails/POC_emails with the combined instance.

jdpye commented 4 months ago

That did the trick. Appreciated, Mike!

Changing PIs with time is a tricky proposition!

mhpob commented 3 months ago

Tests added to confirm this is handled:

https://github.com/mhpob/otndo/blob/main/tests/testthat/test-project_contacts.R#L114

https://github.com/mhpob/otndo/blob/main/tests/testthat/test-project_contacts.R#L141