Closed pgoldweic closed 1 year ago
I checked it to make sure, and I can't reproduce the issue.
{
baseElements: { selectors: ['div.foo'] },
selectors: [
{ selector: 'div.foo', format: 'blockTag', options: { leadingLineBreaks: 5 } }
]
}
-- this works just fine in my experiments, elements are selected and formatted accordingly.
That's how it works in the code. Selected base elements are processed by the same rules as any children elements.
I can't see typos in your second example (block
formatter doesn't have anything to do with uppercase
option but that's irrelevant to the described issue).
Make sure you are running what you think you are running.
Thanks @KillyMXI for your prompt response! However, I continue to see the totality of the text in my tests... this is very odd. I have double checked to ensure that my syntax is correct and haven't found anything wrong yet. I've also changed to using a 'heading' format instead of 'block' to see if that causes any changes, but the output hasn't changed. Let me know if you have any other ideas. Thanks!
I don't have enough information to even guess.
How do you run your code? If in Node.js, then what Node version is it? Are you using html-to-text version 9.0.3? Is there any chance you're editing one file but testing another? Are you preprocessing your html in any way before converting?
Try to make an isolated example. (npm init
a separate package, npm i html-to-text
, in the index.js
do just the conversion, similar to the example, just with your html and options. Run it with node ./index.js
)
Does the issue persist this way? If yes, then I'd like to take a look at the reproduction example (code and html). If no, then you'd have to keep narrowing on the cause of the issue in your pipeline differences.
ok @KillyMXI , I think I figured out how to resolve the problem, although I'm not sure I can explain it myself (most likely I misunderstood the use of the configuration instructions for better performance - that is the 'compile' option). This morning I had changed in my script the line that read:
const { convert } = require('html-to-text')
and changed it with:
const { compile } = require('html-to-text')
const convert = compile({
wordwrap: 130
})
and then used 'convert' just like I was using it before the change. However, this caused the code to break as I described earlier. When I changed it back to using the original configuration for 'convert', it started working again. From here I conclude that the 'compile' configuration is likely not appropriate for regular use.
const { compile } = require('html-to-text')
const convert = compile({ ...options }) // options here
const text = convert(html) // no options here
-- this convert
is different - it already has options in it. You can't add more options later when you call it.
It is recommended when you have to process many documents with the same options.
Perhaps I can improve the documentation a bit to make the difference clearer.
That sounds like a good idea @KillyMXI . Thanks for your explanation!
I updated readme a bit. That will hopefully reduce the chance of such confusion.
Documentation is due for a rework. I'm not paying a lot of attention to it currently, before I will get to properly organizing it.
I am trying to retrieve text for a specific selector ONLY, with specific options for such selector. I've tried doing the following:
which works correctly but does NOT have any options for how I want to the title to show up. When I try instead:
I get the text for the whole document and NOT just for the baseElement selectors. What am I doing wrong? Or, is there no way to specify formatting for the base element selectors?