protontypes / AwesomeCure

Analyze and cure awesome lists by collecting, processing and presenting data from listed Git projects.
MIT License
17 stars 3 forks source link

Markdown parser that works with most awesome lists #4

Open Ly0n opened 2 years ago

Ly0n commented 2 years ago

We need a better more robust parser to work with all awesome lists that get tested by the linter.

tjarkdoering commented 2 years ago

I feel like doing this. What are our requirements for this? We have some in the other issues already (#5 , #2 , maybe #7 ).

What I am currently thinking about: This should work as a GitHub-Action that creates, then updates one or multiple csv files that can then be used further by AwesomeCure.

Ly0n commented 2 years ago

This is the library that we use at the moment: https://github.com/protontypes/AwesomeCure/blob/main/awesomecure/awesome2py.py It was developed by @kikass13, it works perfectly for OpenSustain.tech.

What are our requirements for this? To make this project work with the most awesome list we have to create a much more generic version because you do not want to lose the context information of the list like the rubric and the oneliner. A simple solution would be to use this package here: https://pypi.org/project/urlextract/ In this case, we would lose the context information like the onliner.

You could also have a look at the linter itself because this could also needs to parse the markdown to lint the single enties. https://github.com/sindresorhus/awesome-lint

Another solution could be found here but the code is not under an open-source license: https://github.com/lee212/md2dict

This should work as a GitHub-Action that creates, then updates one or multiple csv files that can then be used further by AwesomeCure.

That would be a good solution. When we refactor AwesomeCure as a real python package we can separate it into different modules.