sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.45k stars 482 forks source link

Read DIMACS files #16924

Closed malb closed 9 years ago

malb commented 10 years ago

Sage writes dimacs files (for SAT solvers) but doesn't seem to read them.

CC: @nathanncohen @sagetrac-tmonteil

Component: interfaces: optional

Author: Travis Scholl

Branch/Commit: 169849b

Reviewer: Martin Albrecht

Issue created by migration from https://trac.sagemath.org/ticket/16924

videlec commented 10 years ago
comment:3

Hello,

seems to be done in sympy (which is shipped with Sage)

sage: from sympy.logic.utilities.dimacs import load
sage: load('1 2 \n 3')
And(Or(cnf_1, cnf_2), cnf_3)

But I do not know how sympy fits with the current SAT stuff in Sage.

Vincent

malb commented 10 years ago
comment:4

it's very easy to load/parse: just turn each line into a tuple and feed those to the SatSolver class with add_clause. Something like

for line in dimacs_file:
    if line.startswith("c"):
        continue # comment
    if line.startswith("p"):
        continue # header
    line = line.split(" "))
    clause = map(int,[e for e in line if e])
    clause = clause[:-1] # strip trailing zero
    solver.add_clause(clause)
edd8e884-f507-429a-b577-5d554626c0fe commented 10 years ago
comment:5

This is perhaps off-topic. I have read the dimacs.py file and i am surprised that the class maintains the data as an opened file (see self._tail). I am thinking of the case, where the data is a string given by propcalc.formula("a|b").satformat(), it looks weird to me that the class has to create a file on the disk for that. Is there a reason for that design choice that i could be missing ? Why not maintaining as a list of tuples of integers (corresponding to a list of clauses), and opening a file only when writing and loading data to/from a file ?

malb commented 10 years ago
comment:6

Hi, the idea was to avoid having the clauses in memory twice: once in the solver and once (in a bloated format) in Python. The rationale behind this was that in my applications these SAT clauses can be rather big (think 1-2GB).

tscholl2 commented 9 years ago

Author: Travis Scholl

tscholl2 commented 9 years ago

Commit: d9274c6

tscholl2 commented 9 years ago

Branch: u/tscholl2/read_dimacs_16924

tscholl2 commented 9 years ago
comment:7

I added a read method as described in the comments here. I split it into two methods, a read method with the usual read(filename) signature and also a load(file_object) method which actually reads the lines from the file. This way it's easy to use other kinds of buffers besides files. Also it allows easy testing using fake files from StringIO. The import uses the six package so the example is compatible with both python 2 and 3. It should read line by line so it shouldn't store the entire files in memory as it sounds like people use somewhat larger files for this.


New commits:

d9274c6added satsolver method to read in DIMACS files
malb commented 9 years ago
comment:8

I think file objects are a rare thing in the Sage "API", i.e. as far as I know we don't usually use them. I'm not sure there should be two methods, just one with a filename. Also, aren't "read" and "load" a bit too generic as names given that these are methods on solver instances?

tscholl2 commented 9 years ago
comment:9

Thanks for the comment.

I wasn't sure about the kind of use case this class gets. You mentioned in a comment that you sometimes use files on the order of gigabytes. I thought it might be useful to have easy access to a method which you only need a stream in case you don't have a disk file (for example a memory mapped file object, a StringIO file object, or some kind of stream from stdin). I don't know if you can open those with fopen.

However, if you think that most people just read from a file on the disk, then I could easily remove the load function.

As for the read name, I used read because there was already a write method and it seemed complementary. Do you think parse is a better name?

malb commented 9 years ago
comment:10

You're right read makes sense given that there is a write. How about both functions are merged into one? If the parameter is a string it is considered a file name? Otherwise it is considered a file object?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

623cbedmerged read and load methods
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from d9274c6 to 623cbed

tscholl2 commented 9 years ago
comment:12

That's a great idea. I merged the methods and uploaded the new version.

malb commented 9 years ago
comment:14

Looks good to me.

malb commented 9 years ago

Reviewer: Martin Albrecht

vbraun commented 9 years ago
comment:16

Docs don't build

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 623cbed to 79c7c68

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

79c7c68added an 'r'
tscholl2 commented 9 years ago
comment:18

Added an "r" so the doc string is raw text instead of a string so that the "\n"'s in the example don't crash the doc building.

videlec commented 9 years ago
comment:20

Hello,

The docstring of the function should follow the scheme

def my_function(x,y,z):
    r"""
    one line description

    longer description if needed

    EXAMPLES::

        sage: my_function(1,2,3)

The following is not helpful in the documentation

See https://github.com/sagemath/sage-prod/issues/16924

It would make sense in an AUTHOR section. Or within the TESTS section if it would correspond to some bug fix. Moreover, the syntax is now trac:`16924`.

In your example, I do not understand what you want to show.

sage: solver.read(file_object)
sage: solver
DIMACS Solver: ''

It seems that nothing happened... Would it be possible to print the clauses?

Vincent

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

34d8f00fixed docstring and example
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 79c7c68 to 34d8f00

tscholl2 commented 9 years ago
comment:22

Thanks for the comments!

I changed the doc string as you suggested and changed the example to print out the clauses. It's a lot better for the example.

Please let me know if you notice anything else that doesn't fit in.

videlec commented 9 years ago
comment:23

I hardly found specification for DIMACS files. In http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html they say several things that you are not taking care of:

In other words, as far as I understand, this would be valid

c hey
c ho
p cnf 3 2
1 -2 0 3
1   0

Moreover, it is written In some examples of CNF files, the rule that the variables are numbered from 1 to N is not followed. The file might declare that there are 10 variables, for instance, but allow them to be numbered 2 through 11.

It would be nice if you know some link with the exact specification of the format.

malb commented 9 years ago
comment:24

Replying to @videlec:

  • the comments should only be at the begining

Why enforce that restriction when there is no harm (as far as I can see) in being more liberal?

  • there is one line starting with p cnf NUM_VAR NUM_CLAUSES (the same format is used for other constraint satisfcation problems)

As far as I know, many parsers these days ignore this (source: I chatted to Máté Soos ages ago, I might be misremembering, though)

  • a clause might extend on more than one line and a clause might start on the same line as another one

I did not know about that one.

In other words, as far as I understand, this would be valid

c hey
c ho
p cnf 3 2
1 -2 0 3
1   0
videlec commented 9 years ago
comment:25

Replying to @malb:

Replying to @videlec:

I very naively read the web page I mentioned. I am fine with the current state, but I just wanted to be sure that anybody (that uses this format) agrees with this parser. And of course, I an not the required person to judge that.

One option would be to write down in the docstring that the DIMACS format is not well specified.

Vincent

tscholl2 commented 9 years ago
comment:26

Sorry it took me a while, I'm at a Sage days working on other things. But I asked people here and got directed to a SAT solver competition website http://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html.

I copied their rules here

The clause about "ending with 0 on the same line" means they don't allow the multi-line clauses. So it sounds like the definition of "DIMACS standards" depends on who you ask.

I will modify the docstring to clearly describe the variation it assumes the file is in.

Should I post links to these pages describing different rules for DIMACS files in the docstring as well?

EDIT: Sorry I realized I copied the wrong link. Here is the corect one: http://www.satcompetition.org/2009/format-benchmarks2009.html

malb commented 9 years ago
comment:27

Hi, might as well thrown in the links. Sounds like a good plan.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago
comment:28

If, instead of reimplementing it yourself, you prefer to sweat over GLPK's API, then you can try calling "glp_read_cnfsat" [1]. That's a way to delegate the desponsibility of handling all cases :-P

Nathann

[1] http://elis.dvo.ru/~lab_11/glpk-doc/cnfsat.pdf

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

169849bnoted differences in DIMACS file specs in docs
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 34d8f00 to 169849b

tscholl2 commented 9 years ago
comment:30

I added some lines to the documentation for the read method which explains the differences in DIMACS files with links to various "standards". I rebuilt the docs and everything seemed to be formatted alright.

I thought the method Martin wrote is flexible and simple enough that it wasn't worth it to call (and sweat over) GLPK's definition of DIMAC files. Thank you for pointing out another source though Nathann!

malb commented 9 years ago
comment:31

Looks good to me.

vbraun commented 9 years ago

Changed branch from u/tscholl2/read_dimacs_16924 to 169849b