[Bug] Extraneous back slashes

iuliaturc commented 5 days ago

When scraping https://huggingface.co/docs/transformers/main_classes/pipelines, I'm seeing a lot of back slashes:

Firecrawl Markdown:

### FillMaskPipeline\
\
### classtransformers.FillMaskPipeline\
\
[<source>](https://github.com/huggingface/transformers/blob/v4.21.2/src/transformers/pipelines/fill_mask.py#L34)\
\
(model: typing.Union\[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')\]tokenizer: typing.Optional\[transformers.tokenization\_utils.PreTrainedTokenizer\] = Nonefeature\_extractor: typing.Optional\[ForwardRef('SequenceFeatureExtractor')\] = Nonemodelcard: typing.Optional\[transformers.modelcard.ModelCard\] = Noneframework: typing.Optional\[str\] = Nonetask: str = ''args\_parser: ArgumentHandler = Nonedevice: int = -1binary\_output: bool = False\*\*kwargs)\
\
Parameters\
\
- **model** ( [PreTrainedModel](/docs/transformers/v4.21.2/en/main_classes/model#transformers.PreTrainedModel) or [TFPreTrainedModel](/docs/transformers/v4.21.2/en/main_classes/model#transformers.TFPreTrainedModel)) —\
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from\
[PreTrainedModel](/docs/transformers/v4.21.2/en/main_classes/model#transformers.PreTrainedModel) for PyTorch and [TFPreTrainedModel](/docs/transformers/v4.21.2/en/main_classes/model#transformers.TFPreTrainedModel) for TensorFlow.\

Note these back slashes don't always show up. For instance, when I scrape https://huggingface.co/transformers/main_classes/tokenizer.html#transformers, I get cleaner Markdown:

## PreTrainedModel

### classtransformers.PreTrainedModel

[<source>](https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/modeling_utils.py#L1297)

(config: PretrainedConfig\*inputs\*\*kwargs)

Base class for all models.

[PreTrainedModel](/docs/transformers/v4.44.2/en/main_classes/model#transformers.PreTrainedModel) takes care of storing the configuration of the models and handles methods for loading,
downloading and saving models as well as a few methods common to all models to:

nickscamara commented 5 days ago

Interesting, ccing @tomkosm here.

rafaelsideguide commented 1 day ago

Hey @iuliaturc thanks for bringing this up! The backslashes you’re seeing are actually due to the way our markdown parser handles text that’s part of a link or button. In this case, the text you’re referring to is likely inside an expandable block (with the "expand 14 parameters" button). The parser adds these backslashes to preserve the link functionality within markdown.

We’ll be closing this issue as "not planned," but feel free to reopen it or create a new issue if needed. Let me know if you have any further questions!

iuliaturc commented 20 hours ago

Thanks for the explanation!

mendableai / firecrawl

[Bug] Extraneous back slashes #662