mozilla / participation-metrics-org

Participation metrics planning repository
4 stars 4 forks source link

Data is not updated #190

Open canasdiaz opened 5 years ago

canasdiaz commented 5 years ago

Data is not updated for the following data sources:

A change included in the latest release broke the data enrichment. This new release changes the way the projects.json is used but it is clearly buggy and our tests did not foresee the error.

This ticket will be closed when all data sources are up to date.

canasdiaz commented 5 years ago

Data is up to date for all data sets except Git. We are having a look at some repos related to the Rust project. As soon as they are synced with the origin this ticket can be closed.

mafesan commented 5 years ago

Hi again,

We have performed some tests locally both for retrieving the raw data and generating the enriched indexes. Our logs did not contain any errors and the number of raw and enriched items matched with the data from the original source (in this case, https://github.com/rust-lang/rust).

Conversely, when we performed the same tests in your infrastructure, we noticed many connection errors in your server. Thus, we think this mismatch in the number of elements could be related to the amount of allocated memory of your ES. This may be causing that some items are lost when enriching the indexes. Please, could you review your ES logs and let us know if you notice something wrong?

Best, Miguel-Ángel

canasdiaz commented 5 years ago

Hi again, we did a few tests past weekend. In one of them we managed to reproduce the issue creating a standard Git index with the standard bulk size , the cluster was not available from the node 54.183.12.130 on Saturday night. UTC time: 2019-03-09 23:46:32,949

@johngian can u shed some light on this? is maybe the cluster overloaded?

We (Bitergia) are about to fix the issues with the Git outdated data and publish a new release to survive these connection errors.

canasdiaz commented 5 years ago

Hi again folks, data is updated.

The issue is isolated and fixed with a change in our repository list. Why do we need that? The repository https://github.com/servo/doc.servo.org.git contains a single commit with 7M lines affecting 141K files. This is breaking our infra.

@hmitsch would it be possible to discard that repo? can I remove it from the spreadsheet?

hmitsch commented 5 years ago

Hi @sanacl,

thanks for diving into this. Oh man ...

I checked out the repo, it says:

Documentation generated from Servo’s source code in its master branch http://doc.servo.org/

We definitely do not need this repo in our list. 😄 Please go ahead and remove it.

Best regards, Henrik

canasdiaz commented 5 years ago

Repo removed, we are done :)