Open mrchristian opened 1 year ago
@Mahvish will run the HTML zo Markdown Python script - once she has joined the semanticClimate GitHub organisation. The repository with the script is linked above and the script can be found in the python directory. I will pass along instructions for use from Simon Bowie.
Quote from Simon Bowie:
I wrote a relatively basic Python script (https://github.com/SimonXIX/quarto_semanticclimate/blob/main/python/quarto_markdown.py) which takes HTML as input, processes the HTML, converts it to Markdown, and then processes the Markdown before outputting to a .qmd file.
The HTML generated from the IPCC report was fairly messy so various bits of processing were required on the HTML to convert the styles in the header to HTML styles that the Markdown converter could understand. The processing on the Markdown is to tidy it up and make the converted text look better in Markdown format.
At the most basic level, the HTML to Markdown conversion is done using the markdownify module: https://pypi.org/project/markdownify/
This is run using python3 ./python/quarto_markdown.py
.
The input I used was https://github.com/petermr/semanticClimate/blob/main/ipcc/ar6/syr/lr/html/fulltext/groups_groups.html and the output can be seen through Quarto rendering at https://simonxix.github.io/quarto_semanticclimate/groups_groups.html
End quote
Hi @06maHi do you want to have a go at running the Python script Simon Bowie has added here, you can fork this repo if you like. Please feel free to ask any questions or reach out for help if you need it. You dont need to run Quarto as well - but you can if you like - info here https://nfdi4culture.github.io/FSCI-Class-Publishing-from-Collections/#_5_0
hello @mrchristian I have installed markdownify package to convert html into markdown. Let me know if this is the correct one. Also I will need some help using it.
The first step would just be to see if script runs in a fork of your own.
To run the HTML to Markdown you only need to run the script in the directory /python - you dont need to run Quarto - although you can if you like
the requirements need installing - https://github.com/semanticClimate/city-climate-plans-notebook/blob/main/python/requirements.txt
script - https://github.com/semanticClimate/city-climate-plans-notebook/tree/main/python
I think in script we'd have to change local paths, also check if HTML from PMR gets copies across first.
A few questions: Did you edit the local paths to get the script to work? Are you running it on a Clone or Fork repository? If your up for it I would suggest trying to install the Quarto framework - instructions are in the README.md https://github.com/semanticClimate/city-climate-plans-notebook/blob/main/README.md and you can read more here about install and use https://nfdi4culture.github.io/FSCI-Class-Publishing-from-Collections/
Yaa I have edited the local path and also used fork repo. I will go for quarto installation once I am back to work.
https://github.com/semanticClimate/city-climate-plans-notebook