lauriii / drupalcores

A project to generate a list of core contributers
http://drupalcores.com
MIT License
27 stars 38 forks source link

Companies and countries are now broken as d.o returns '403 Unauthorized' #119

Closed jcnventura closed 5 years ago

jcnventura commented 5 years ago

Drupal.org has some kind of spambot prevention in place that easily kicks in when running the drupalcores company + country mapping.

The open(url) call to the user page throws an exception with message '403 Unauthorized'.

To solve this, we either have to login and issue these calls with a cookie, or somehow get the company + country mappings from a different data source.

lauriii commented 5 years ago

Thanks for investigating this. We could try to get this information from the Drupal.org APIs but we would have to get confirmation if the same restrictions apply to the APIs.

mlhess commented 5 years ago

The restrictions do not apply to the Drupal.org API. If that data is in there, that would be the better way to go.

borisson commented 5 years ago

The problem with using the api's is that this will result in a lot more calls.

For example, if we do a call to: https://www.drupal.org/api-d7/user.json?name=borisson_ we get a field: field_organizations, the first url in there is: https://www.drupal.org/api-d7/field_collection_item/1349157, that does not return a result, https://www.drupal.org/api-d7/field_collection_item/1349157.json does. (But just appending .json to it shouldn't be too hard). From there we get the organisation name: Calibrate in this case.

If we want to display the logo for it, we need to follow the field_organization_reference field (same problem with .json there. (https://www.drupal.org/api-d7/node/1962504.json), from there we can get the logo: https://www.drupal.org/api-d7/file/4481286.json, but that doesn't give us the cropped logo. So we still need to build the correct url to a smaller logo, or use our own cropping.

So that means it should be quite easy to find the the correct name for the company, if we want to keep the same functionality and display the logo as well, this could lead to some headaches.

jcnventura commented 5 years ago

I think we can remove the logo, to be honest. Using the logo even makes it worse to search for company names...

borisson commented 5 years ago

We should probably find consensus about that. That would make using the api a lot easier, but just removing the functionality wouldn't be great. If we agree on it however, that would make life so much easier.

lauriii commented 5 years ago

I would be fine removing the logo, at least temporarily. As mentioned, this would fix https://github.com/lauriii/drupalcores/issues/68 as well.

jcnventura commented 5 years ago

Working on this, I've found a weakness in the drupal.org API. This works:

https://www.drupal.org/api-d7/user.json?name=Wim%20Leers

But this does not:

https://www.drupal.org/api-d7/user.json?name=wim-leers

Neither does this:

https://www.drupal.org/api-d7/user.json?name=wim%20leers

We need a way to have the name query be case insensitive, or maybe somehow have the user alias be a separate part of the JSON response.

jcnventura commented 5 years ago

I've figured out in the meantime that the reason why we are searching for the lowercased name is because drupalcores lowercases all names. The git commits have the right case, so we can use that.

Ignore my previous comment.

borisson commented 5 years ago

I tested the PR and it looks like it does a good job.

When I ran gulp test, I did get 35 contributors under Users not found and 4 companies were under should be filled via company infos

The companies looked very good (except for the same users that were under Users not found)

borisson commented 5 years ago

I think that this PR drastically improves the speed of the results, we can probably get this in and have it run again to get all the new data. Afterwards we can always improve the results.

lauriii commented 5 years ago

The PR has been now merged. Thank you everyone! 🙏

I'm currently on a trip to Finland and I don't have the SSH keys with me so that I could access the server easily. I will run the manual companies update this Thursday or Friday when I get back home.