suzukimilanpaak / cran_index

Other
0 stars 0 forks source link

Requirements, Design & Architecture #1

Open suzukimilanpaak opened 8 years ago

suzukimilanpaak commented 8 years ago

Requirements

description

Package Indexer

Batch job to index information fetched from Cran web API for all packages.

Implement it with Rails runner where a command to run the task is the following bx rails runner Tasks::CranIndex.revise

Simple web API which responds with packages info in JSON.

suzukimilanpaak commented 8 years ago

Design of Models

I used same terminologies as in CRAN API so as not to have unnecessary confusion for another readers of code.

ER diagram

package has_many descriptions and package caches current_description

            ┌────────────────┐
            │                │
            │                │
            │    packages    │──┐
            │                │  │
            │                │  │
            └────────────────┘  │
                     ┬          │
                     │          │
                     │ has_many │ has_one :current_version
                     │          │
                     ▼          │
            ┌────────────────┐  │
            │                │  │
            │  descriptions  │  │
         ┌──│                │◀─┘
         │  │   - version    │
         │  │                │
         │  └────────────────┘     ┌────────────────────────┐
         │           │             │                        │
         │            ─ ─ ─ ─ ─ ─ ▶│committers_descriptions │
         │                         │                        │
         │                         └────────────────────────┘
         │                                      │
 has_many│:through committers_descriptions
         │                                      │
         │           ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
         │
         │           │
         │
         │           ▼
         │  ┌────────────────┐
         │  │                │
         │  │                │
         └─▶│   committers   │
            │                │
            │                │
            └────────────────┘
suzukimilanpaak commented 8 years ago

Notes & Architecture

Package Indexer

Package List

suzukimilanpaak commented 8 years ago

Result of Package Indexer

https://www.youtube.com/embed/gyuQvkoiTPE

How long does updating packages take?

It totally depends on mostly how big the size of packages are.

As a tangible example, I sampled first 100 packages;

% time MAX_INDEX_SIZE=100 bx rails runner Tasks::CranIndex.revise
0.67s user 0.11s system 0% cpu 11:00.15 total

So, it takes 11mins to update information for 100 packages.

There are 8302 packages in total. Which means it takes 15 hours to update all packages listed. 15.22 = 11 * (8302 / 100) / 60

But, note that it's the case where update all packages from scratch.

I get the Package Indexer not to access API for description unless package is not updated. ^ Thus, it would takes much less time to update the information for the second and later time. Like, for example, the second attempt 30 mins after the first attempt took 31 seconds as it doesn't have any update of description.

suzukimilanpaak commented 8 years ago

Result Package List

suzukimilanpaak commented 8 years ago

Code which might be worth reading

Enumerator.new

sample

I use Enumerator.new quite often for mainly the following two reasons;

Actually, a method like in this size needn't utilise Enumerator.new but it's good to use for the purpose of readability and durability for future complex demands.

find_or_initialize_by

I initialised new version of description and used description.new_record? to find if it's needed to be updated. ^

It largely affected shortening time to take Packages Indexer.

has_and_blongs_to_many with STI

description.authors and description.maintainer refer same model Committer.^. This is done by combination of has_and_blongs_to_many and type column of committers table, where author's committers.type = 'Author'. I don't judge it's a good way but it's an experiment.

Other important code in this project

I use normal laws of OOP, including SRP, DI and so on apart from the above.

Model

Controller

suzukimilanpaak commented 8 years ago

Thank you very much for reading and your taking time for me.