Added byte offsets to rule sets

G0dwin commented 7 years ago

I have added some basic logic to the parser to capture byte offsets when scanning for rule sets and added a field to RuleSet to retain them. This will allow code coverage and lint utilities (for example) to report where in a file a particular rule was found (see the CSS code coverage utility I'm working on here: lingua-franca/marmara).

There is still more work that needs to be done, primarily when it comes to differentiating files (from @import statement or otherwise), and when urls are expanded.

grosser commented 7 years ago

idk about this ... kind of neat feature ... but more work/memory for the 99% usecase only thing striking me as a bit unfortunate is the addition of more optional argument to a few methods ... would be nice to make them keyword args ... but that's not easy without breaking existing usage ...

@akzhan thoughts ?

akzhan commented 7 years ago

It's nice contribution, but undone.

We need any use case to be bundled with (may be in examples folder or in wiki or read.me).

akzhan commented 7 years ago

Also it's not clear to detect line endings by Windows platform.

We may use "read binary mode" to keep line endings of file.

grosser commented 7 years ago

FYI https://github.com/giakki/uncss

On Thu, Mar 9, 2017 at 12:23 AM, Akzhan Abdulin notifications@github.com wrote:

Also it's not clear to detect line endings by Windows platform.

We may use "read binary mode" to keep line endings of file.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/premailer/css_parser/pull/85#issuecomment-285285702, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAsZwR4JguYFEaEOPpY2YBTKWIMbJdbks5rj7cMgaJpZM4MXoM_ .

G0dwin commented 7 years ago

@akzhan I can add documentation before merging, however, I'm not seeing a wiki, only the README.md file.

@grosser I can refactor so that we only collect offsets if an option is passed to load_uri! and try to ensure the former code path remains as close as possible to the original but only add the additional overhead when the option is present.

To get around the additional optional arguments, we could subclass RuleSet to something like FileRuleSet where the additional code for storing offsets and anything else specific to relating back to the original CSS code could be kept. In order to support import statements and multiple files in general, we will also need to store a filename.

akzhan commented 7 years ago

@G0dwin ok, we need

a) a section in README. b) separate position-aware code. c) replace CRLF detection with byte-precise code (File.read('rb') etc.).

G0dwin commented 7 years ago

Alright: a) I added a section in the README b) I added the option :capture_offsets to load_uri!, load_file!, and load_string!, we only look for, process, and store offsets if this flag is set. I also created a subclass of RuleSet called OffsetAwareRuleSet. This class takes in additional constructor params and stores the offsets. We only create the subclass if the flag is set. c) I fixed CRLF detection though fixing the encoding, I didn't change the IO.read(...) call. I tested on Windows and Ubuntu machines to ensure the offsets were identical.

In addition, I added a benchmark rake task (rake benchmark) to test the difference between the old code and the new. I don't have an isolated machine to run the tests on but the difference seemed negligible, the last test I ran, parsing import.css 50 000 times took 50.7 seconds using the old code and 47.1 seconds with the new. Parsing screen.css 5000 times from dialect.ca took 65.9 seconds using the old code, and 63.7 seconds after my change. I would account the decrease to either a random variance, or due to some minor fixes I made to the code base along the way.

I also added a filename attribute to the OffsetAwareRuleSet with which I was able to get imports working with offset capturing. I removed the code I added to tests and isolated offset capturing tests into one test file.

Let me know if there are any other changes that you would like to see before merging, big or small.

Cheers, godwin

akzhan commented 7 years ago

LGTM

akzhan commented 7 years ago

Some commits related to rebase (Gemfile.lock, version up), don't worry.

akzhan commented 7 years ago

Just note that is't released as 1.5.0.pre.

Other updates should be proposed by other pull requests.

premailer / css_parser

Added byte offsets to rule sets #85