html-to-text / node-html-to-text

Advanced html to text converter
Other
1.61k stars 223 forks source link

How do i prevent newlines when using whole page as a json object which should singleline #317

Closed juhorissanen5 closed 4 months ago

juhorissanen5 commented 4 months ago

I try to make whole html-page to go in singleline, but the problem is that i now get the multiple lines even with all the selector options. const options = { wordwrap: false, preserveNewlines: false, selectors: [{ selector: '*', format: 'inline' }, // Block-level elements { selector: 'article', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'aside', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'div', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'footer', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'form', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'header', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'main', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'nav', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'section', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, // Handling
specifically { selector: 'br', format: 'skip' }, // Skip
tags entirely // Adjusting heading elements { selector: 'h1', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'h2', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'h3', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'h4', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'h5', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'h6', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, // Other elements that might introduce breaks or spacing { selector: 'p', options: { leadingLineBreaks: 1, trailingLineBreaks: 1} }, { selector: 'blockquote', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'ol', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } }, { selector: 'ul', options: { leadingLineBreaks: 1, trailingLineBreaks: 1} }, { selector: 'table', options: { leadingLineBreaks: 1, trailingLineBreaks: 1} }, ] }; So in the end i convert this to json object with JSON.stringify(); but it still makes it go with multiple lines. So how could i make this to go in single line????

KillyMXI commented 4 months ago

Are you familiar with escape sequences in JSON?

https://stackoverflow.com/questions/3020094/how-should-i-escape-strings-in-json

Because you might be solving a wrong problem here. It would've been strange if people use JSON everywhere and didn't think how to put multiline strings in it, isn't it? Sounds like you're handcrafting a JSON string (there should be better ways), and also may not fully understand what you do and observe with JSON.stringify(), and what a JSON object is.


Then, just in case you really need a single line from html-to-text:

Are you familiar with CSS selectors specificity? html-to-text tries to close up where possible, and there are some differences.

{ selector: '*', format: 'inline' } is already defined:

https://github.com/html-to-text/node-html-to-text/blob/5b7ca1c1a736a730c9a4fa1b6db6172e50f4ee3e/packages/html-to-text/src/html-to-text.js#L44

It has minimal possible specificity and only affects the tags that don't have any format specified.

If you need everything inline - you have to override format for those that have different format predefined.

options: { leadingLineBreaks: 1, trailingLineBreaks: 1} only make sense for block-level tags and can only get you as far as reducing the number of line breaks to 1 around the tag. People have been coming here asking why setting those numbers to zero doesn't work. I'm surprised I don't see it here. Why - because it makes no sense, use proper format instead.

(Maybe I should introduce an option to ignore predefined format and start with a blank array of selectors. Or somehow lower the specificity of predefined selectors, but have to see whether that may lead to unexpected effects... So far, this issue seems ill-conceived to confirm the necessity of this change.)

juhorissanen5 commented 4 months ago

Thank you for the answer. I couldn't give you all the context, because i use the library in production actually so i didn't want to give more context from my code. I will continue to study more about this library and maybe use it in future production also.