open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
95 stars 27 forks source link

Adjust process_dataset to avoid OOM errors #552

Closed skearnes closed 3 years ago

skearnes commented 3 years ago

GitHub Actions is running out of memory when it tries to load all of the USPTO grant data in at the same time.

codecov[bot] commented 3 years ago

Codecov Report

Merging #552 (2e8e9c2) into main (5949003) will increase coverage by 0.03%. The diff coverage is 73.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #552      +/-   ##
==========================================
+ Coverage   76.04%   76.07%   +0.03%     
==========================================
  Files          19       19              
  Lines        1862     1877      +15     
  Branches      454      457       +3     
==========================================
+ Hits         1416     1428      +12     
- Misses        305      306       +1     
- Partials      141      143       +2     
Impacted Files Coverage Δ
ord_schema/scripts/process_dataset.py 79.47% <73.33%> (+0.05%) :arrow_up:
skearnes commented 3 years ago

Thanks; waiting on #551 so I can make process_dataset output .pb.gz instead of .pb.

skearnes commented 3 years ago

FYI added a new flag --output_format that defaults to '.pb.gz'.