This is a follow up to the performance investigation related to cloud-based hubs: #37
When researching the above, we found two places in connect_hub and connect_model_output that affect performance when using hubData with larger hubs based on S3:
the check_file_format function
using exclude_invalid_files = TRUE (of the two, this is the larger performance impact)
To ensure users of cloud-based hubs can connect in a reasonable amount of time, we decided to add an optional parameter to connect_hub and connect_model_output that will specify whether or not to perform the above checks. When not supplied by the user, the default behavior will be to skip them when working with cloud-based hubs.
Definition of done
[ ] connect_hub has an optional parameter to skip the checks
[ ] connect_model_output has an optional parameter to skip the checks
[ ] when the new parameter isn't specified by the user, the code will set it skip checks when working with cloud-based data
[ ] when there's an error opening the files when checks are skipped, the message should a note about trying again with checks enabled
Background
This is a follow up to the performance investigation related to cloud-based hubs: #37
When researching the above, we found two places in
connect_hub
andconnect_model_output
that affect performance when usinghubData
with larger hubs based on S3:check_file_format
functionexclude_invalid_files = TRUE
(of the two, this is the larger performance impact)To ensure users of cloud-based hubs can connect in a reasonable amount of time, we decided to add an optional parameter to
connect_hub
andconnect_model_output
that will specify whether or not to perform the above checks. When not supplied by the user, the default behavior will be to skip them when working with cloud-based hubs.Definition of done
connect_hub
has an optional parameter to skip the checksconnect_model_output
has an optional parameter to skip the checks