Closed capac closed 2 months ago
Looks like they're using an own format for docs: https://github.com/catboost/catboost/tree/master/catboost/docs
You can probably only import it as generic HTML document as documented here: https://kapeli.com/docsets#dashDocset
I tried following your suggestion to import the generic HTML files, but as far as I can tell the HTML may not be so "generic". I know very little about Javascript, but I think that the HTML makes use of a Javascript library (probably app.client.js
) to extract the data from an embedded code block in the HTML file itself and then visualize it in the browser. This can already been seen in the index.html
file in the root of the CatBoost document directory. The Python script I found that populates the SQLite index (which uses BeautifulSoup) can't parse any of the HTML code from the index.html
file. To make this clearer, I've attach a portion of the HTML code from the index.html
file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="title" content="CatBoost">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>CatBoost | CatBoost</title>
<style type="text/css">
body {
height: 100vh;
}
</style>
<link type="text/css" rel="stylesheet" href="../_bundle/app.client.css" />
</head>
<body class="yc-root yc-root_theme_light">
<div id="root"></div>
<script type="application/javascript">
window.STATIC_CONTENT = false
window.__DATA__ = {"data":{"leading":true,"toc":{"title":"CatBoost","href":"index.html","items":[{"name":"Installation","expanded":true,"items":[{"href":"concepts/installation.html","name":"Overview","id":"Overview-0-0.47378348458599273"},
[...]
{"title":"Videos","href":"concepts/educational-materials-videos"}]}]},"meta":{"title":"CatBoost","style":[],"script":[]}},"router":{"pathname":"index.html"},"lang":"en"};
</script>
<script type="application/javascript" src="../_bundle/app.client.js"></script>
</body>
</html>
I'm thinking that the data block looks a lot like JSON, so probably a good idea would to modify the script to parse the JSON block. Do you have any other suggestions? Thanks a lot.
Cheers, Angelo
I mean if it is important to you and you're so inclined, you can try to write a custom parser: https://doc2dash.hynek.me/en/stable/extending/ If it's some client-side shenanigans, there might be a chance to find the data somewhere in JSON form or something.
I'm trying to generate Dash documentation for CatBoost, but even after successfully generating the documentation following the instructions in the README, I get the error message.
doc2dash -n "catboost 1.2.5" -d "/Users/angelo/Library/Application Support/doc2dash/DocSets/catboost/1-2-5/" --icon-2x "/Users/angelo/Pictures/Icons/dash/catboost/icon@2x.png" -v -j -u "https://catboost.ai/en/docs/" -I "/Users/angelo/Programming/docs/catboost/docs-gen/en/index.html" ./ -a -f
"/Users/angelo/Programming/docs/catboost/docs-gen/en" does not contain a known documentation format.
Any suggestions?
Angelo