Example parsers - Githubissues

Geal commented 9 years ago

We currently have a few example parsers. In order to test the project and make it useful, other formats can be implemented. Here is a list, if anyone wants to try it:

text file formats:
- [x] INI
- [x] FASTQ
- [x] libconfig-like configuration file format
- [x] torrc configuration file
- [x] ISO 8601 dates
- [x] Web archive
- [x] TOML
- [x] bencode
- [x] CSV
- [ ] YAML
- [ ] CommonMark
audio, video and image formats:
- [x] MP4 (partial implementation)
- [x] GIF
- [x] FLAC
- [x] FLV
- [x] MKV
- [ ] OGG
- [ ] MPEG TS
- [ ] AVI
- [ ] PNG
- [ ] JPEG
- [ ] EXIF
- [ ] MP3
document formats:
- [x] torrent files
- [x] TAR
- [ ] PDF
- [ ] MS-CFB (compound format, used in doc, xls, ppt, cab, msi files)
- [ ] GZ
- [ ] ZIP
- [ ] RAR
- [ ] binary PLIST
database formats:
- [x] Redis database files
- [x] Ceph crush maps
network protocol formats:
- [x] IRC
- [x] Pcap-NG
- [x] IP
- [x] Ethernet
- [x] PCAP
- [x] NTP
- [x] SNMP
- [x] TLS
- [ ] TCP
- [ ] UDP
- [ ] DNS
executable formats:
- [ ] Portable executables (PE)
- [ ] ELF
- [ ] GameBoy ROM
crypto related:
- [x] ASN.1
- [ ] X.509 certificates
- [ ] DER public and private keys
- [ ] SSL/TLS packets
- [ ] OpenPGP
Programming Languages
- [x] Rust
- [ ] Lua
- [ ] Python
- [ ] C
interface definition formats:
- [x] Thrift
- [x] Protobuf
- [ ] AIDL

nickbabcock commented 8 years ago

Boxcars is an example of a Rocket League replay parser with serde serialization. Let boxcars be a good example of Rust code using nom, and serde as extensive examples are hard to come by. While lacking user friendly error message -- among other issues, tests and documentation strive to be thorough.

dtolnay commented 8 years ago

Yeah, I'm aware of the scale problem of Rust. I don't want to write that one, but I think it's a good holy grail for any parser library written in Rust.

As of version 0.10.0, syn is now able to parse practically all of Rust syntax. One of my test cases is to parse the entire github.com/rust-lang/rust repo into an AST and print it back out, asserting that the output is identical to the original.

I am technically not using nom but instead a fork which removes the IResult::Incomplete variant. I found that the extra macro code generated to handle Incomplete was more than doubling the compile time for something that I didn't even want. Nevertheless, the code is enough like nom that I think we can check off the box.

Example snippet to parse one arm of a match expression:

named!(match_arm -> Arm, do_parse!(
    attrs: many0!(outer_attr) >>
    pats: separated_nonempty_list!(punct!("|"), pat) >>
    guard: option!(preceded!(keyword!("if"), expr)) >>
    punct!("=>") >>
    body: alt!(
        map!(block, |blk| ExprKind::Block(BlockCheckMode::Default, blk).into())
        |
        expr
    ) >>
    (Arm {
        attrs: attrs,
        pats: pats,
        guard: guard.map(Box::new),
        body: Box::new(body),
    })
));

Geal commented 8 years ago

@dtolnay syn is an amazing example, thanks for your hard work :)

Geal commented 8 years ago

@dtolnay could I get your input on #356? It might fix your issues with compile times, so I'd like to get your thoughts on this.

J-F-Liu commented 7 years ago

I am writing a PDF library using nom to parse PDF syntax. Released v0.1.0 just now. https://github.com/J-F-Liu/lopdf

valarauca commented 7 years ago

So I've implemented a EDI parser for the ANS standard EDI for work with this. Awesome library really useful. Sadly that's owned by my employer.

I've started implementing an x64 assembler with nom. I'm really struggling with writing the parser. The main reason is register names have a lot of overlap, and are very short. For example r8, r8w, r11, and r12d. Ideally I want to map these to an enum. map!() makes this easy, but how can I match those terms in nom?

Keruspe commented 7 years ago

I converted several "keys" to enum values in my brainfuck parser, might or might not be relevant to your needs. See the first parsers defined with "named!" https://github.com/Keruspe/brainfuck.rs/blob/master/src/parser.rs

przygienda commented 7 years ago

is there a way (or it would be great if it's possible) to generate EBNF from this? Great package BTW ...

ithinuel commented 7 years ago

Hi, I just pushed a pcap parser : https://github.com/ithinuel/pcap-rs. It still needs the PR #492 to be merge so it can use official nom crate.

Any feedback is welcome.

bbqsrc commented 7 years ago

A parser for the Mediawiki format would be quite useful.

dwerner commented 7 years ago

@Geal thanks for an awesome library! I wrote a wavefront obj/mtl 3d mesh parser using it nom-obj, which I published to crates.io

olivren commented 7 years ago

I wrote a parser for the simple key/value text format .properties, which is a standard for Java configuration files. It uses nom 3.1. Can it be added to the list?

This is the first parser I wrote using a Parser Combinator library. If anyone can review my code I would be delighted. Also, I tried to add error reporting to my code, but I gave up after I tried to insert add_return_error and return_error calls all over the place to no avail (in the branch "error-reporting"). Is there an example of a text parser that reports parsing errors?

Edit: I rewrote my library using Pest instead of Nom, as I find it more suited to parsing a text format. I will definitely use nom if I need to parse a binary format, though.

santifa commented 7 years ago

@Geal thanks for this library. I've implemented a parser for URI's which is part of a larger side project for RDF (n3, ttl,...) parsers. The full abnf of rfc 3986 is implemented but the pct-encoding is still a bit messy.

dbrgn commented 7 years ago

Here's a parser for ICE candidates SDP (RFC 5245), used for example in WebRTC: https://github.com/dbrgn/candidateparser

kamarkiewicz commented 7 years ago

I wrote a Session Initiation Protocol (RFC3261) low-level push parser with API inspired by seanmonstar/httparse (hyper's HTTP parser): https://github.com/kamarkiewicz/parsip

thejpster commented 6 years ago

I'd be interested in something that could parse SNMP MIB and YANG.

https://en.wikipedia.org/wiki/YANG

ctrlcctrlv commented 6 years ago

The BitTorrent example has been deleted, it seems.

Riduidel commented 6 years ago

As a beginner in Rust world, I'm quite sure I will say something horribly wrong, but is there any planned support for some XML dialects ? (typically RSS/ATOM) ?

dwerner commented 6 years ago

Nothing at all wrong with asking, and I'm sure someone might want to implement one at some point, but this is a list of example parsers written using nom, rather than a list of formats "supported" by nom. An xml parser would be an excellent idea for learning nom, imo.

porglezomp commented 6 years ago

@Riduidel if you're specifically interested in just having parsers for those formats, look at https://github.com/rust-syndication. I don't think there's any nom involved there though.

naturallymitchell commented 6 years ago

HTTP: https://github.com/hjr3/weldr/blob/00481f80ae60bd6b312805245c126c168ab77b36/src/http/parser.rs

vandenoever commented 6 years ago

A parser for Turtle. It passes the test suite in 15ms.

https://github.com/vandenoever/rome/tree/master/src/io/turtle

progval commented 6 years ago

I wrote a Python parser: https://docs.rs/python-parser/

idursun commented 5 years ago

I think Redis database file format parser is not using nom at all. I couldn't find any reference to nom anywhere.

nelsonjchen commented 5 years ago

@idursun Maybe it refers to this old branch from a year before the last update to master. https://github.com/badboy/rdb-rs/tree/nom-parser

saggit commented 5 years ago

is there any SQL parser？

naturallymitchell commented 5 years ago

is there any SQL parser？

it'd seem better to me to import it to an sql engine and interact with that data using Diesel. parsing flat sql files seems very limited.

instead of writing a one-off Rust app to do this, you could add diesel bindings to Torchbear, see https://github.com/foundpatterns/torchbear/issues/85 , then make a Speakeasy library for transporting data from your schema using content model in ContentDB.

then, you could develop a lot further beyond.

ithinuel commented 5 years ago

@naturallymitchell maybe @saggit was simply looking for something to extract some data from a raw sql dump. Like a one-off log analysis tool. :D

MarkMcCaskey commented 5 years ago

I made a GameBoy ROM parser with nom5! https://github.com/MarkMcCaskey/gameboy-rom-parser https://crates.io/crates/gameboy-rom

It's extremely simple and doesn't do much, but the crate provides a useful abstraction over the metadata of GameBoy ROMs.

I'll add more optional validation functions to it and refactor my emulator's ROM code to use it soon.

edit: this post is what inspired me to make this

naturallymitchell commented 5 years ago

It's extremely simple and doesn't do much, but the crate provides a useful abstraction over the metadata of GameBoy ROMs.

@MarkMcCaskey It could even make sense to refactor it then into a generalized library with config files (like, TOML and YAML, and now SANE). Do you think that'd be too much more work?

dwerner commented 5 years ago

@Geal - I wanted to post my public suffix domain list parser that I wrote a few months back. I couldn't find a performant library that did what I needed, so I grabbed nom and went to work. https://github.com/dwerner/nom-psl

MarkMcCaskey commented 5 years ago

@naturallymitchell

Do you mean specifying the layout of the bytes as data and creating a dynamic data structure from it? That's an interesting idea, but I don't think it'd be too helpful for my use case -- as I see it, the primary value-add of the gameboy rom parser is the data layer that it exposes, which lets the user get things like the game's title as as string or the exact cartridge type and how much ROM and RAM it has as well-named, plain Rust values.

The parser may be implementable with serde deserialize on a repr(C) struct though, which is kind of the reverse of what you're saying, I think... I'm not familiar enough with how serde-derive handles errors though.

o0Ignition0o commented 5 years ago

Just got a 0.0.1 version of an NMEA-0183 parser using nom 5 https://github.com/YellowInnovation/nmea-0183 . I need to have a look at the docs and guidelines (the code is ugly for now) and refactor it :) I hope to submit a pull request adding a clean version of it to the parsers list soon ! :)

kurotych commented 4 years ago

This is SIP parser https://github.com/armatusmiles/sipcore/tree/master/crates/sipmsg

Torture test: https://github.com/armatusmiles/sipcore/commit/32040e57435a4d0cf2cb847520e8daafa4d8ad97 ( https://tools.ietf.org/html/rfc4475#section-3.1.1.1 )

Geal commented 4 years ago

@armatusmiles thanks, i added it to the list in 2e58a2c

bionicles commented 4 years ago

Please add OpenCypher to the list... a nice way to parse Graph DB queries could enable a wave of innovation in databases. There are zero legit serverless / autoscaling or decentralized graph databases (like you'd get with a CRDT/ORDT backend for an OpenCypher parser). GunJS is fairly close but JavaScript is not ideal for storage IMHO

OtaK commented 3 years ago

Wrote a UBJSON parser w/ nom Pretty early version with just parsing, but it does the job.

https://github.com/OtaK/ubjson

https://crates.io/crates/ubjson

NilsIrl commented 3 years ago

There is a PDF parser here: https://github.com/J-F-Liu/lopdf (it requires using the nom_parser feature).

FWIW, with lopdf, the nom parser is much faster than the default parser

atmnk commented 2 years ago

I wrote a tool with its own programming language using nom. here is source repo.

erihsu commented 2 years ago

The gds2-parser released at https://crates.io/crates/gds2_io. BTW, my pull request tag is #1497

manuschillerdev commented 2 years ago

would it be feasible to write an ecmascript/typescript parser with nom as well? Or would the scope be too big for that?

alexrsagen commented 2 years ago

I have written 2 (public) parsers using nom which may be used as examples:

BitTorrent/bencoding: https://github.com/alexrsagen/rustorrent/blob/4076d0ea689a950021164d2fdd412021519e7c68/src/bencode.rs
BitTorrent/torrent files (metainfo) parsing using bencode.rs: https://github.com/alexrsagen/rustorrent/blob/4076d0ea689a950021164d2fdd412021519e7c68/src/torrent/metainfo.rs
MS-CFB: https://github.com/alexrsagen/rs-nomcfb (probably not a perfect implementation, but usable to parse Outlook .msg files and as a basic example for how to write parsers using nom)

edg-l commented 2 years ago

I made a bencode parser (the format used by .torrent files), https://github.com/edg-l/nom-bencode/

LikeLakers2 commented 2 years ago

Hey there! I was wondering why the Rust parser on this list is syn? From what I can tell, syn does not use nom (although it might have in the past).

Since this is a list of examples of parsers built with nom, I don't see why we should be linking to syn here.

OtaK commented 2 years ago

@LikeLakers2

Hey there! I was wondering why the Rust parser on this list is syn? From what I can tell, syn does not use nom (although it might have in the past).

Since this is a list of examples of parsers built with nom, I don't see why we should be linking to syn here.

https://github.com/dtolnay/syn/issues/476

syn was using nom until v0.15, this issue was created 3 years before syn dropped its usage of nom. That's why it's still linked here.

You're absolutely correct that it should be removed though.

eatgrass commented 1 year ago

mdict-parser is a parser library for .mdx dictionary format file https://github.com/eatgrass/mdict-parser

wjwei-handsome commented 1 year ago

crussmap is a parser library and tool for .chain file format

https://github.com/wjwei-handsome/crussmap

ifsheldon commented 2 months ago

I am a bit surprised that no one mentioned HTML!? I saw nom_html_parser, but it was long left unmaintained.

rust-bakery / nom

Example parsers #14