Open manycoding opened 6 years ago
Say, we scrape one website from different categories. In this case all items will have the same root, e.g. https://pandas.pydata.org/pandas-docs/stable/categorical.html https://pandas.pydata.org/pandas-docs/stable/merging.html have https://pandas.pydata.org/pandas-docs/stable/ in common.
https://pandas.pydata.org/pandas-docs/stable/
By returning this information, we can analyze urls without json schema.
pandas-profiling does something similar https://github.com/pandas-profiling/pandas-profiling/blob/master/pandas_profiling/model/base.py#L122
Say, we scrape one website from different categories. In this case all items will have the same root, e.g. https://pandas.pydata.org/pandas-docs/stable/categorical.html https://pandas.pydata.org/pandas-docs/stable/merging.html have
https://pandas.pydata.org/pandas-docs/stable/
in common.By returning this information, we can analyze urls without json schema.