haasad / EcoInventDownLoader

Download, unpack and import ecoinvent into your brightway2 project in one simple step
MIT License
13 stars 6 forks source link

Logging issue #30

Open romainsacchi opened 10 months ago

romainsacchi commented 10 months ago

Could there be a logging issue now that ecoinvent changed its web interface?

I struggle to login even though I'm rather confident to input the correct user id/password. In fact, I can login on the website... but not via eidl.

haasad commented 10 months ago

Hi Romain,

sorry for the late reply. It looks like ecoinvent switched to a single sign on authentication with keycloak. This is not something eidl can currently support. Some of the code needs to be rewritten to make it work again.

The main problem is that I don't work with LCA/ecoinvent anymore, so I don't have ecoinvent credentials to test and develop this :-(

Does the page after logging in still look the same? If yes, then it's probably a smallish change to make it work again. If no, then eidl is most likely completely obsolete and needs to be replaced with something new.

romainsacchi commented 10 months ago

Thanks Adrian. Unfortunately, I believe this also affects Activity Browser -- will let them know, maybe they can work on it.

haasad commented 10 months ago

I'll leave the issue open. If it's not fixed then eidl is pretty much useless.

marc-vdm commented 10 months ago

@romainsacchi We are aware, thanks for the initiative though.

haasad commented 10 months ago

I was able to organize some credentials for testing. Did some quick scripting to check what changed on ecoinvent side:

UN=<username>
PW=<password>
TOKEN=$(curl -d "client_id=apollo-ui" -d "username=$UN" -d "password=$PW" -d "grant_type=password" https://sso.ecoinvent.org/realms/ecoinvent/protocol/openid-connect/token | jq -r .access_token)
curl -H "Authorization: Bearer $TOKEN" https://ecoquery.ecoinvent.org/3.9.1/cutoff/files
<!doctype html><html lang="en"><head><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1"/><link rel="icon" href="/icons/favicon.ico"/><link rel="apple-touch-icon" sizes="180x180" href="/icons/apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="/icons/favicon-32x32.png"/><link rel="icon" type="image/png" sizes="16x16" href="/icons/favicon-16x16.png"/><link rel="manifest" href="/site.webmanifest"/><link rel="mask-icon" href="/icons/safari-pinned-tab.svg" color="#dd1414"/><meta name="msapplication-TileColor" content="#dd1414"/><link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700&display=swap"/><link rel="manifest" href="/manifest.json"/><script defer="defer" src="/static/js/main.4268775c.js"></script><link href="/static/css/main.f04e5175.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div><div id="portal"></div></body></html>
curl -s -H "Authorization: Bearer $TOKEN" https://api.ecoquery.ecoinvent.org/web/versions | jq '.[0:3], {"total: ": length}'
[
  {
    "version": "3.9.1",
    "release_date": "2022-10-01",
    "system_model": "apos"
  },
  {
    "version": "3.9.1",
    "release_date": "2022-10-01",
    "system_model": "cutoff"
  },
  {
    "version": "3.9.1",
    "release_date": "2022-10-01",
    "system_model": "consequential"
  }
]
{
  "total: ": 38
}

The bad news:

The good news:

I haven't found any public documentation for the API, but I'll play around with it a bit and see what I can do. Would be much cleaner anyway to get the required info via API instead of the very brittle HTML parsing "hack" used before.

romainsacchi commented 10 months ago

We can directly ask ecoinvent to provide us with the API documentation if you want. Should I?

haasad commented 10 months ago

Sure, asking can't hurt. Otherwise it should be possible to "reverse engineer" the necessary API calls by using the dev tools in the browser, but I haven't gone so deep yet.

cmutel commented 10 months ago

So, I should have communicated this already. Sorry about that.

There is currently no library to consume from the ecoinvent API, and no official documentation for the API endpoints. You should not expect there to be either of these anytime soon.

The SSO uses JSON Web Tokens, and normally logging in requires some shared secret (at least as far as I understand it). I haven't been able to get something working, though this stuff is far outside my comfort level.

You can watch the API calls to build a map of the routes, and probably can figure out what needs to be sent to get the results you want. I think the only difficult thing here is authentication. One additional hiccup could be getting the file downloads, as these URLS are generated per session and link click.

For the time being I think you need to tell people to login and download the file manually and then automate the import. It's not great that the EIDL library is being killed off by actions of others but there isn't much I can do about this, and certainly not right now. Sorry.

haasad commented 10 months ago

Thanks for the feedback @cmutel. To be fair, I'm very surprised that eidl even worked for as long as it has now. I've been expecting a breaking change for a long time :smile:

I'll see if I can figure something out with the API.

haasad commented 10 months ago

This is a minimal working example with curl and jq to authenticate and download the ecoinvent 3.9.1_cutoff_ecoSpold02.7z file with the new API:

UN=yourusername
PW=yourpassword
TOKEN=$(curl -s -d "client_id=apollo-ui" -d "username=$UN" -d "password=$PW" -d "grant_type=password" https://sso.ecoinvent.org/realms/ecoinvent/protocol/openid-connect/token | jq -r .access_token)
curl -s https://api.ecoquery.ecoinvent.org/files -H "Authorization: Bearer $TOKEN" > files.json                                                                                             
UUID=$(jq -r '.[] | select(.version_name == "3.9.1") | .releases[] | select(.system_model_name == "Allocation cut-off by classification") |  .release_files[] | select(.name == "ecoinvent 3.9.1_cutoff_ecoSpold02.7z") | .uuid' files.json)                                               
download_url=$(curl -s https://api.ecoquery.ecoinvent.org/files/r/$UUID -H "Authorization: Bearer $TOKEN" | jq -r .download_url)
curl $download_url -o ecoinvent_3.9.1_cutoff_ecoSpold02.7z

So it's totally doable, I "only" need to translate these steps to python.

cmutel commented 10 months ago

@haasad Wow, amazing! And I had no idea that you could use jquery on the command line, that sort of blows my mind.

See follow-up comment below.

~With this new architecture I think one should rewrite EIDL completely to make it more complete.~ We have already forked it here: https://github.com/brightway-lca/ecoinvent_interface, but I don't really care where the repo is, as long as one can expect it is maintained. That was the original reason for creating a fork.

~Here are the user stories that a new version would address:~

~ As a user, I want to be able to find the integer id of a process given its filename (combination of UUIDs), so that I can perform follow-up operations on that activity~ ~ As a user, I want to be able to find the integer id of a process given its activity, product, location, and unit, so that I can perform follow-up operations on that activity~ ~ As an auditor, I want to be able to get the PDF report on an process, so that I can audit LCIs built on top of ecoinvent~ ~ As a programmer, I want to be able to get a complete ecoinvent release, so that I can install the database locally~ ~ As a user, I want to be able to get the ecospold XML for a single process, so that I can modify or install it myself~ ~ As a tool developer, I want to be able to get LCIA scores for one or more processes, so that I can build quick and simple calculators based on ecoinvent~

~These are real user stories, and some client library needs to support them. @haasad do you think that EIDL could be adapted for this broader functionality, or should we create something new on our own?~

cmutel commented 10 months ago

I don't think it makes sense to do a broader refactor now, as the ecoinvent publication API is apparently expected to change a lot in the future.

haasad commented 10 months ago

I have now released eidl 2.0.0, which works with the new ecoinvent website. It now uses a bearer token in the http request header for authentication and uses the API to find available files instead of parsing the HTML of the page as it used to. Tokens are refreshed automatically before they're used. Available version/system models combinations are still parsed from the filenames like previously as I haven't found a good way to do this with the API.

It should be available soonish on conda-forge (PR is merged), it's already available on the bsteubing channel.

@romainsacchi @marc-vdm I've tested it stand-alone and in the activity-browser and everything seems to work. But I'd be grateful if you could test it additionally and let me know if you encounter any issues.

One thing I haven't figured out yet is how it works/breaks for users with restricted ecoinvent licenses. (see #28 and https://github.com/LCA-ActivityBrowser/activity-browser/issues/775). This info was previously available as an HTML tag.

@cmutel:

We have already forked it here: https://github.com/brightway-lca/ecoinvent_interface, but I don't really care where the repo is, as long as one can expect it is maintained. That was the original reason for creating a fork.

I'm happy to continue maintaining eidl in its current scope (I've never really stopped). It's pretty essential for the activity-browser to work if users want a GUI only experience. Besides the download from the ecoinvent page, the cross-platform 7zip extraction was a big inconvenience before eidl.

But eidl's scope is pretty limited, I think you'd be better off with a dedicated API client library with proper documentation for the type of user stories you mentioned above.

cedric-roussel commented 10 months ago

One thing I haven't figured out yet is how it works/breaks for users with restricted ecoinvent licenses. (see https://github.com/haasad/EcoInventDownLoader/pull/28 and https://github.com/LCA-ActivityBrowser/activity-browser/issues/775). This info was previously available as an HTML tag.

The /files endpoint currently lists only files that are accessible so there is no need for additional filtering.

haasad commented 10 months ago

One thing I haven't figured out yet is how it works/breaks for users with restricted ecoinvent licenses. (see #28 and LCA-ActivityBrowser/activity-browser#775). This info was previously available as an HTML tag.

The /files endpoint currently lists only files that are accessible so there is no need for additional filtering.

That's what I was hoping for and why I didn't use the publicly available /web/versions endpoint, thank you for confirming :+1:

jsvgoncalves commented 10 months ago

@haasad IIRC https://github.com/brightway-lca/ecoinvent_interface main difference at this point is the persistence in login credentials [1]

class Settings(BaseSettings):
    username: Optional[str]
    password: Optional[SecretStr]

[1] - https://github.com/brightway-lca/ecoinvent_interface/blob/23825cfaf32f2c504473d3894223d10cce7dd932/eidl/settings.py#L8-L10

haasad commented 10 months ago

@jsvgoncalves I wasn't aware of the fork before this discussion. I'll gladly accept a pull request if this would be useful for you. At first glance it looks like most of the other additional features (pdf download, logged_in decorator etc) are also broken with the new website.

@cmutel @cedric-roussel @jsvgoncalves I'm also totally open to discuss transferring this repo to the brightway-lca or ecoinvent orgs on github if you like. Or adding you as maintainers here if you feel like you can't depend on me reacting fast enough to issues. In the end this is just a tool I wrote more than 5 years ago, because it was useful for me at the time. I don't actively use it anymore. But I keep investing some effort into it from time to time, because it seems to be useful for others as well. Especially for the ActivityBrowser folks, it's pretty tightly integrated there. In my opinion it makes sense to keep this repo, because it's the source for the conda-forge packaging etc. (https://github.com/conda-forge/eidl-feedstock).

cedric-roussel commented 10 months ago

I like it, that was fast 👍

There is one pitfall currently with legal agreements: they need to be accepted on the website by all new and returning users. Unfortunately, there won't be a clear message provided by the API. Users who have never logged into the new website will receive an empty list, even with a valid license.

marc-vdm commented 10 months ago

@haasad Can confirm this is now working after a regular update conda update activity-browser

Kind regards,

Marc van der Meide PhD Candidate

Leiden University | Faculty of Science - Institute of environmental sciences (CML)

Einsteinweg 2 | Leiden 2333 CC | linkedinhttps://www.linkedin.com/in/marcvandermeide/?locale=en_US


From: Adrian Haas @.> Sent: Saturday, August 26, 2023 09:20 To: haasad/EcoInventDownLoader @.> Cc: Meide, M.T. van der (Marc) @.>; Mention @.> Subject: Re: [haasad/EcoInventDownLoader] Logging issue (Issue #30)

I have now released eidl 2.0.0, which works with the new ecoinvent website. It now uses a bearer token in the http request header for authentication and uses the API to find available files instead of parsing the HTML of the page as it used to. Tokens are refreshed automatically before they're used. Available version/system models combinations are still parsed from the filenames like previously as I haven't found a good way to do this with the API.

It should be available soonish on conda-forge (PR is merged), it's already available on the bsteubing channel.

@romainsacchihttps://github.com/romainsacchi @marc-vdmhttps://github.com/marc-vdm I've tested it stand-alone and in the activity-browser and everything seems to work. But I'd be grateful if you could test it additionally and let me know if you encounter any issues.

One thing I haven't figured out yet is how it works/breaks for users with restricted ecoinvent licenses. (see #28https://github.com/haasad/EcoInventDownLoader/pull/28 and LCA-ActivityBrowser/activity-browser#775https://github.com/LCA-ActivityBrowser/activity-browser/issues/775). This info was previously available as an HTML tag.

@cmutelhttps://github.com/cmutel:

We have already forked it here: https://github.com/brightway-lca/ecoinvent_interface, but I don't really care where the repo is, as long as one can expect it is maintained. That was the original reason for creating a fork.

I'm happy to continue maintaining eidl in its current scope (I've never really stopped). It's pretty essential for the activity-browser to work if users want a GUI only experience. Besides the download from the ecoinvent page, the cross-platform 7zip extraction was a big inconvenience before eidl.

But eidl's scope is pretty limited, I think you'd be better off with a dedicated API client library with proper documentation for the type of users stories you mentioned above.

— Reply to this email directly, view it on GitHubhttps://github.com/haasad/EcoInventDownLoader/issues/30#issuecomment-1694209805, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIFUDTR2W6Z6BVUVWZJQZDXXGPVTANCNFSM6AAAAAA3YJXT5U. You are receiving this because you were mentioned.Message ID: @.***>

cmutel commented 10 months ago

@haasad Thanks very much for figuring out the tokens code. I have rewritten most of the ecoinvent_interface code using this new approach here: https://github.com/brightway-lca/ecoinvent_interface/tree/two-ooh. The code works, but testing and documentation is still very much TBD.

I chose to go a different direction for this library; see the differences with EIDL here: https://github.com/brightway-lca/ecoinvent_interface/tree/two-ooh#relationship-to-eidl

In the end I think it is fine to have two libraries, at least for now. By Brightcon the ecoinvent_interface needs to have the ability to get process documentation as well.