Open suzukimilanpaak opened 8 years ago
I used same terminologies as in CRAN API so as not to have unnecessary confusion for another readers of code.
package has_many descriptions
and package caches current_description
┌────────────────┐
│ │
│ │
│ packages │──┐
│ │ │
│ │ │
└────────────────┘ │
┬ │
│ │
│ has_many │ has_one :current_version
│ │
▼ │
┌────────────────┐ │
│ │ │
│ descriptions │ │
┌──│ │◀─┘
│ │ - version │
│ │ │
│ └────────────────┘ ┌────────────────────────┐
│ │ │ │
│ ─ ─ ─ ─ ─ ─ ▶│committers_descriptions │
│ │ │
│ └────────────────────────┘
│ │
has_many│:through committers_descriptions
│ │
│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│
│ │
│
│ ▼
│ ┌────────────────┐
│ │ │
│ │ │
└─▶│ committers │
│ │
│ │
└────────────────┘
https://www.youtube.com/embed/gyuQvkoiTPE
It totally depends on mostly how big the size of packages are.
As a tangible example, I sampled first 100 packages;
% time MAX_INDEX_SIZE=100 bx rails runner Tasks::CranIndex.revise
0.67s user 0.11s system 0% cpu 11:00.15 total
So, it takes 11mins to update information for 100 packages.
There are 8302 packages in total. Which means it takes 15 hours to update all packages listed.
15.22 = 11 * (8302 / 100) / 60
But, note that it's the case where update all packages from scratch.
I get the Package Indexer not to access API for description unless package is not updated. ^ Thus, it would takes much less time to update the information for the second and later time. Like, for example, the second attempt 30 mins after the first attempt took 31 seconds as it doesn't have any update of description.
GET http://localhost:3000/packages.json
I use Enumerator.new
quite often for mainly the following two reasons;
Actually, a method like in this size needn't utilise Enumerator.new
but it's good to use for the purpose of readability and durability for future complex demands.
I initialised new version of description and used description.new_record?
to find if it's needed to be updated. ^
It largely affected shortening time to take Packages Indexer.
description.authors
and description.maintainer
refer same model Committer.^. This is done by combination of has_and_blongs_to_many
and type column
of committers
table, where author
's committers.type = 'Author'
. I don't judge it's a good way but it's an experiment.
I use normal laws of OOP, including SRP, DI and so on apart from the above.
Thank you very much for reading and your taking time for me.
Requirements
description
Package Indexer
Batch job to index information fetched from Cran web API for all packages.
Implement it with Rails runner where a command to run the task is the following
bx rails runner Tasks::CranIndex.revise
Package List
Simple web API which responds with packages info in JSON.
GET http://localhost:3000/packages.json