proycon / codemetapy

A Python package for generating and working with codemeta
https://codemeta.github.io/
GNU General Public License v3.0
24 stars 5 forks source link

Gitlab.com + Selfhosted GItlab/Github parse features #19

Closed xmichele closed 2 years ago

xmichele commented 2 years ago

Works with GITLAB_TOKEN. Backward compatiblity tested. Not complete as current Github parser but it can be improved in the future To work effectively combine with codemeta-harvester pull request

xmichele commented 2 years ago

Hello, some steps forwards and added the selfHosted gitlab/gitapi parse feature! A curiosity: is there any reason for the gitapi return type to be -> Union[URIRef,BNode,None]: ? I just see a return of a string, e.g. repo_url Thanks

proycon commented 2 years ago

Hello, some steps forwards and added the selfHosted gitlab/gitapi parse feature!

Nice! It's good to allow people to self-host indeed

A curiosity: is there any reason for the gitapi return type to be -> Union[URIRef,BNode,None]: ? I just see a return of a string, e.g. repo_url Thanks

Hmm, that looks like an oversight indeed, it shouild just be str

proycon commented 2 years ago

Sorry also for the delay lately, but we're making good progress I think, almost there! :)

proycon commented 2 years ago

@xmichele I did some code reviewing and refactoring in commit 3f15c3d:

This commit does revert a few of the changes that I think are not needed for the wanted functionality and keeps the code cleaner:

It may require some further testing, but if you also agree with these changes I can probably push it directly to your master branch.

xmichele commented 2 years ago

@xmichele I did some code reviewing and refactoring in commit 3f15c3d:

This commit does revert a few of the changes that I think are not needed for the wanted functionality and keeps the code cleaner:

  • keeps gitapi and web split:

    • no need to invoke get_rate_limit if you don't expect github/gitlab rate limiting (removed the rate_limited argument)
    • splits the gitapi.parse() function into a get_repo_kind() and parse()
    • no joined gitapi/web input type
  • simplified the test for whether a custom URL is a gitlab instance

    • currently there is no longer a test to check if a custom URL is a github instance, I don't know if this occurs in practice? There's no self-hosted github is there? (only github enterprise, do they allow custom domains?)
  • no need for repo_kind to be a tuple, better to return explicitly

    • merged repo_kind with inputtype where possible
  • brought get_rate_limit a bit closer to how it was, keeping the logic simpler
  • some style fixes to keep things consistent (4 space indentation, no redundant parentheses)

It may require some further testing, but if you also agree with these changes I can probably push it directly to your master branch.

Overall I think that can be fine! Some open points:

proycon commented 2 years ago

Overall I think that can be fine!

Great, I pushed it to your branch so it is now part of this PR

Some open points:

  • most of badges are in the .md, e.g. Readme.md and we shall look for them as well

I'm not sure what you mean here. The badge code in serializers.html.get_badges() is pretty specifically to generate badges for git platforms . It's not meant to propagate all the badges that might be in the readme.md .

I do parse badges from the readme in other parts of codemeta-harvester, like for the repostatus badge.

  • the repo-kind cache shall be persistent because multiple repo scanning imply codemetapy memory loss due to re-execution

We could make it a bit more persistent yeah, just as long as it remains a simple cache. It's not a real priority ithough, I'd say keep it for a later PR so we can merge this one first?

  • scan an arbitrary number of private repos ( and not just a single gitlab and github ): the gitlab_token/github_token shall be in the repo-project.yaml or anyway related to a repo url/base_uri. This implies some security precautions but with this the auto-detect will be even less important

That's probably more up to codemeta-harvester? To just invoke codemetapy with the proper environment variables set.

xmichele commented 2 years ago

Ok for the last points. For the badges I think that the mod to codemeta-harvester and codemetapy is not that difficult (e.g. grep img.shield.io/.. in readme) and appending at the end of repo page info will be a very cool feature about the projects characteristics :)

proycon commented 2 years ago

Sorry it took a while, I wanted to finish some of the test suite (#20) first before I was confident enough to merge things. This is now merged into the master branch (with several fixes on top). Thanks again for your contribution!