question about all_preds.jsonl with SWE-Bench Lite

ivan4722 commented 2 weeks ago

Describe the issue

As SWE-Bench Lite has 300 entries, I expected 300 results in all_preds.jsonl, but I have 180. Is this because there are 120 entries that could not produce a result?

Optional: Relevant documentation page

No response

klieret commented 2 weeks ago

Do you still have your logs? What trajectories do you see in your trajectory folder? Hard to help without more information.

ivan4722 commented 2 weeks ago

Do you still have your logs? What trajectories do you see in your trajectory folder? Hard to help without more information.

I dont have my logs but I have my trajectories folder. Some of them might be from previous runs with the dev set, but here:


 all_preds.jsonl              django__django-15695.traj                 pylint-dev__pylint-5859.traj            sphinx-doc__sphinx-8721.traj
 args.yaml                    django__django-15781.traj                 pylint-dev__pylint-6506.traj            sqlfluff__sqlfluff-1625.traj
 astropy__astropy-6938.traj   django__django-15789.traj                 pylint-dev__pylint-7080.traj            sqlfluff__sqlfluff-1733.traj                django__django-15814.traj                 pylint-dev__pylint-7114.traj            sqlfluff__sqlfluff-1763.traj
 django__django-10914.traj    django__django-15996.traj                 pylint-dev__pylint-7228.traj            sympy__sympy-11400.traj
 django__django-11001.traj    django__django-16229.traj                 pylint-dev__pylint-7993.traj            sympy__sympy-11897.traj
 django__django-11039.traj    django__django-16379.traj                 pytest-dev__pytest-11143.traj           sympy__sympy-12419.traj
 django__django-11133.traj    django__django-16527.traj                 pytest-dev__pytest-11148.traj           sympy__sympy-12481.traj
 django__django-11179.traj    django__django-16820.traj                 pytest-dev__pytest-5103.traj            sympy__sympy-13043.traj
 django__django-11564.traj    django__django-16873.traj                 pytest-dev__pytest-5221.traj            sympy__sympy-13471.traj
 django__django-11583.traj    django__django-17051.traj                 pytest-dev__pytest-5227.traj            sympy__sympy-13480.traj
 django__django-11620.traj   'import pandas as pd.py'                   pytest-dev__pytest-5692.traj            sympy__sympy-13647.traj
 django__django-11742.traj    marshmallow-code__marshmallow-1359.traj   pytest-dev__pytest-7168.traj            sympy__sympy-13915.traj
 django__django-11797.traj    matplotlib__matplotlib-18869.traj         pytest-dev__pytest-7373.traj            sympy__sympy-13971.traj
 django__django-11815.traj    matplotlib__matplotlib-22835.traj         pytest-dev__pytest-7432.traj            sympy__sympy-14024.traj
 django__django-11964.traj    matplotlib__matplotlib-23299.traj         pytest-dev__pytest-8365.traj            sympy__sympy-14308.traj
 django__django-12113.traj    matplotlib__matplotlib-23476.traj         pytest-dev__pytest-8906.traj            sympy__sympy-14396.traj
 django__django-12125.traj    matplotlib__matplotlib-23562.traj         pytest-dev__pytest-9359.traj            sympy__sympy-14774.traj
 django__django-12184.traj    matplotlib__matplotlib-23563.traj         pyvista__pyvista-4315.traj              sympy__sympy-15345.traj
 django__django-12286.traj    matplotlib__matplotlib-23964.traj         scikit-learn__scikit-learn-10508.traj   sympy__sympy-15346.traj
 django__django-12470.traj    matplotlib__matplotlib-24149.traj         scikit-learn__scikit-learn-10949.traj   sympy__sympy-15678.traj
 django__django-12700.traj    matplotlib__matplotlib-24334.traj         scikit-learn__scikit-learn-11040.traj   sympy__sympy-16106.traj
 django__django-12708.traj    matplotlib__matplotlib-25079.traj         scikit-learn__scikit-learn-11281.traj   sympy__sympy-16281.traj
 django__django-12747.traj    matplotlib__matplotlib-25311.traj         scikit-learn__scikit-learn-12471.traj   sympy__sympy-16503.traj
 django__django-12856.traj    matplotlib__matplotlib-25433.traj         scikit-learn__scikit-learn-13142.traj   sympy__sympy-16792.traj
 django__django-13033.traj    matplotlib__matplotlib-25442.traj         scikit-learn__scikit-learn-13241.traj   sympy__sympy-16988.traj
 django__django-13315.traj    matplotlib__matplotlib-25498.traj         scikit-learn__scikit-learn-13496.traj   sympy__sympy-17022.traj
 django__django-13321.traj    mwaskom__seaborn-2848.traj                scikit-learn__scikit-learn-13584.traj   sympy__sympy-18199.traj
 django__django-13401.traj    mwaskom__seaborn-3010.traj                scikit-learn__scikit-learn-13779.traj   sympy__sympy-18621.traj
 django__django-13551.traj    mwaskom__seaborn-3190.traj                scikit-learn__scikit-learn-14087.traj   sympy__sympy-19487.traj
 django__django-13590.traj    patches                                   scikit-learn__scikit-learn-14092.traj   sympy__sympy-20049.traj
 django__django-13710.traj    psf__requests-1963.traj                   scikit-learn__scikit-learn-15535.traj   sympy__sympy-20154.traj
 django__django-13757.traj    psf__requests-2148.traj                   scikit-learn__scikit-learn-25570.traj   sympy__sympy-20212.traj
 django__django-13933.traj    psf__requests-2317.traj                   scikit-learn__scikit-learn-25747.traj   sympy__sympy-20322.traj
 django__django-13964.traj    psf__requests-2674.traj                   sphinx-doc__sphinx-10325.traj           sympy__sympy-20639.traj
 django__django-14017.traj    psf__requests-863.traj                    sphinx-doc__sphinx-10451.traj           sympy__sympy-21055.traj
 django__django-14411.traj    pydata__xarray-3364.traj                  sphinx-doc__sphinx-11445.traj           sympy__sympy-21171.traj
 django__django-14534.traj    pydata__xarray-4094.traj                  sphinx-doc__sphinx-7738.traj            sympy__sympy-21379.traj
 django__django-14580.traj    pydata__xarray-4248.traj                  sphinx-doc__sphinx-7975.traj            sympy__sympy-21612.traj
 django__django-14915.traj    pydata__xarray-4493.traj                  sphinx-doc__sphinx-8273.traj            sympy__sympy-21627.traj
 django__django-14999.traj    pydicom__pydicom-1256.traj                sphinx-doc__sphinx-8282.traj            sympy__sympy-21847.traj
 django__django-15202.traj    pydicom__pydicom-1694.traj                sphinx-doc__sphinx-8435.traj            sympy__sympy-22714.traj
 django__django-15320.traj    pydicom__pydicom-901.traj                 sphinx-doc__sphinx-8474.traj            sympy__sympy-22840.traj
 django__django-15388.traj    pylint-dev__astroid-1196.traj             sphinx-doc__sphinx-8506.traj            sympy__sympy-24066.traj
 django__django-15400.traj    pylint-dev__astroid-1268.traj             sphinx-doc__sphinx-8595.traj            sympy__sympy-24102.traj
 django__django-15498.traj    pylint-dev__astroid-1978.traj             sphinx-doc__sphinx-8713.traj            sympy__sympy-24152.traj```

klieret commented 2 weeks ago

Can you copy your args.yaml here? Just so I see what you ran. And can you check if the number of trajectories corresponds to the number of predictions that you see (ls *.traj|wc -l)?

ivan4722 commented 2 weeks ago

Can you copy your args.yaml here? Just so I see what you ran. And can you check if the number of trajectories corresponds to the number of predictions that you see (ls *.traj|wc -l)?

args.yaml:

@ivan4722 ➜ .../SWE-agent/trajectories/vscode/gpt4__SWE-bench_Lite__default__t-0.00__p-0.95__c-2.00__install-1 (main) $ cat args.yaml
actions:
  apply_patch_locally: false
  open_pr: false
  push_gh_repo_url: ''
  skip_if_commits_reference_issue: true
agent:
  config:
    _commands:
    - arguments:
        line_number:
          description: the line number to move the window to (if not provided, the
            window will start at the top of the file)
          required: false
          type: integer
        path:
          description: the path to the file to open
          required: true
          type: string
      code: 'open() {    if [ -z "$1" ]    then        echo "Usage: open <file>"        return    fi    #
        Check if the second argument is provided    if [ -n "$2" ]; then        #
        Check if the provided argument is a valid number        if ! [[ $2 =~ ^[0-9]+$
        ]]; then            echo "Usage: open <file> [<line_number>]"            echo
        "Error: <line_number> must be a number"            return  # Exit if the line
        number is not valid        fi        local max_line=$(awk ''END {print NR}''
        $1)        if [ $2 -gt $max_line ]; then            echo "Warning: <line_number>
        ($2) is greater than the number of lines in the file ($max_line)"            echo
        "Warning: Setting <line_number> to $max_line"            local line_number=$(jq
        -n "$max_line")  # Set line number to max if greater than max        elif
        [ $2 -lt 1 ]; then            echo "Warning: <line_number> ($2) is less than
        1"            echo "Warning: Setting <line_number> to 1"            local
        line_number=$(jq -n "1")  # Set line number to 1 if less than 1        else            local
        OFFSET=$(jq -n "$WINDOW/6" | jq ''floor'')            local line_number=$(jq
        -n "[$2 + $WINDOW/2 - $OFFSET, 1] | max | floor")        fi    else        local
        line_number=$(jq -n "$WINDOW/2")  # Set default line number if not provided    fi    if
        [ -f "$1" ]; then        export CURRENT_FILE=$(realpath $1)        export
        CURRENT_LINE=$line_number        _constrain_line        _print    elif [ -d
        "$1" ]; then        echo "Error: $1 is a directory. You can only open files.
        Use cd or ls to navigate directories."    else        echo "File $1 not found"    fi}'
      docstring: opens the file at the given path in the editor. If line_number is
        provided, the window will be move to include that line
      end_name: null
      name: open
      signature: open <path> [<line_number>]
    - arguments:
        line_number:
          description: the line number to move the window to
          required: true
          type: integer
      code: 'goto() {    if [ $# -gt 1 ]; then        echo "goto allows only one line
        number at a time."        return    fi    if [ -z "$CURRENT_FILE" ]    then        echo
        "No file open. Use the open command first."        return    fi    if [ -z
        "$1" ]    then        echo "Usage: goto <line>"        return    fi    if
        ! [[ $1 =~ ^[0-9]+$ ]]    then        echo "Usage: goto <line>"        echo
        "Error: <line> must be a number"        return    fi    local max_line=$(awk
        ''END {print NR}'' $CURRENT_FILE)    if [ $1 -gt $max_line ]    then        echo
        "Error: <line> must be less than or equal to $max_line"        return    fi    local
        OFFSET=$(jq -n "$WINDOW/6" | jq ''floor'')    export CURRENT_LINE=$(jq -n
        "[$1 + $WINDOW/2 - $OFFSET, 1] | max | floor")    _constrain_line    _print}'
      docstring: moves the window to show <line_number>
      end_name: null
      name: goto
      signature: goto <line_number>
    - arguments: null
      code: scroll_down() {    if [ -z "$CURRENT_FILE" ]    then        echo "No file
        open. Use the open command first."        return    fi    export CURRENT_LINE=$(jq
        -n "$CURRENT_LINE + $WINDOW - $OVERLAP")    _constrain_line    _print}
      docstring: moves the window down {WINDOW} lines
      end_name: null
      name: scroll_down
      signature: scroll_down
    - arguments: null
      code: scroll_up() {    if [ -z "$CURRENT_FILE" ]    then        echo "No file
        open. Use the open command first."        return    fi    export CURRENT_LINE=$(jq
        -n "$CURRENT_LINE - $WINDOW + $OVERLAP")    _constrain_line    _print}
      docstring: moves the window down {WINDOW} lines
      end_name: null
      name: scroll_up
      signature: scroll_up
    - arguments:
        filename:
          description: the name of the file to create
          required: true
          type: string
      code: "create() {    if [ -z \"$1\" ]; then        echo \"Usage: create <filename>\"\
        \        return    fi    # Check if the file already exists    if [ -e \"\
        $1\" ]; then        echo \"Error: File '$1' already exists.\"\t\topen \"$1\"\
        \        return    fi    # Create the file an empty new line    printf \"\\\
        n\" > \"$1\"    # Use the existing open command to open the created file \
        \   open \"$1\"}"
      docstring: creates and opens a new file with the given name
      end_name: null
      name: create
      signature: create <filename>
    - arguments: null
      code: 'submit() {    cd $ROOT    # Check if the patch file exists and is non-empty    if
        [ -s "/root/test.patch" ]; then        # Apply the patch in reverse        git
        apply -R < "/root/test.patch"    fi    git add -A    git diff --cached > model.patch    echo
        "<<SUBMISSION||"    cat model.patch    echo "||SUBMISSION>>"}'
      docstring: submits your current code and terminates the session
      end_name: null
      name: submit
      signature: submit
    - arguments:
        dir:
          description: the directory to search in (if not provided, searches in the
            current directory)
          required: false
          type: string
        search_term:
          description: the term to search for
          required: true
          type: string
      code: 'search_dir() {    if [ $# -eq 1 ]; then        local search_term="$1"        local
        dir="./"    elif [ $# -eq 2 ]; then        local search_term="$1"        if
        [ -d "$2" ]; then            local dir="$2"        else            echo "Directory
        $2 not found"            return        fi    else        echo "Usage: search_dir
        <search_term> [<dir>]"        return    fi    dir=$(realpath "$dir")    local
        matches=$(find "$dir" -type f ! -path ''*/.*'' -exec grep -nIH -- "$search_term"
        {} + | cut -d: -f1 | sort | uniq -c)    # if no matches, return    if [ -z
        "$matches" ]; then        echo "No matches found for \"$search_term\" in $dir"        return    fi    #
        Calculate total number of matches    local num_matches=$(echo "$matches" |
        awk ''{sum+=$1} END {print sum}'')    # calculate total number of files matched    local
        num_files=$(echo "$matches" | wc -l | awk ''{$1=$1; print $0}'')    # if num_files
        is > 100, print an error    if [ $num_files -gt 100 ]; then        echo "More
        than $num_files files matched for \"$search_term\" in $dir. Please narrow
        your search."        return    fi    echo "Found $num_matches matches for
        \"$search_term\" in $dir:"    echo "$matches" | awk ''{$2=$2; gsub(/^\.+\/+/,
        "./", $2); print $2 " ("$1" matches)"}''    echo "End of matches for \"$search_term\"
        in $dir"}'
      docstring: searches for search_term in all files in dir. If dir is not provided,
        searches in the current directory
      end_name: null
      name: search_dir
      signature: search_dir <search_term> [<dir>]
    - arguments:
        file:
          description: the file to search in (if not provided, searches in the current
            open file)
          required: false
          type: string
        search_term:
          description: the term to search for
          required: true
          type: string
      code: 'search_file() {    # Check if the first argument is provided    if [
        -z "$1" ]; then        echo "Usage: search_file <search_term> [<file>]"        return    fi    #
        Check if the second argument is provided    if [ -n "$2" ]; then        #
        Check if the provided argument is a valid file        if [ -f "$2" ]; then            local
        file="$2"  # Set file if valid        else            echo "Usage: search_file
        <search_term> [<file>]"            echo "Error: File name $2 not found. Please
        provide a valid file name."            return  # Exit if the file is not valid        fi    else        #
        Check if a file is open        if [ -z "$CURRENT_FILE" ]; then            echo
        "No file open. Use the open command first."            return  # Exit if no
        file is open        fi        local file="$CURRENT_FILE"  # Set file to the
        current open file    fi    local search_term="$1"    file=$(realpath "$file")    #
        Use grep to directly get the desired formatted output    local matches=$(grep
        -nH -- "$search_term" "$file")    # Check if no matches were found    if [
        -z "$matches" ]; then        echo "No matches found for \"$search_term\" in
        $file"        return    fi    # Calculate total number of matches    local
        num_matches=$(echo "$matches" | wc -l | awk ''{$1=$1; print $0}'')    # calculate
        total number of lines matched    local num_lines=$(echo "$matches" | cut -d:
        -f1 | sort | uniq | wc -l | awk ''{$1=$1; print $0}'')    # if num_lines is
        > 100, print an error    if [ $num_lines -gt 100 ]; then        echo "More
        than $num_lines lines matched for \"$search_term\" in $file. Please narrow
        your search."        return    fi    # Print the total number of matches and
        the matches themselves    echo "Found $num_matches matches for \"$search_term\"
        in $file:"    echo "$matches" | cut -d: -f1-2 | sort -u -t: -k2,2n | while
        IFS=: read -r filename line_number; do        echo "Line $line_number:$(sed
        -n "${line_number}p" "$file")"    done    echo "End of matches for \"$search_term\"
        in $file"}'
      docstring: searches for search_term in file. If file is not provided, searches
        in the current open file
      end_name: null
      name: search_file
      signature: search_file <search_term> [<file>]
    - arguments:
        dir:
          description: the directory to search in (if not provided, searches in the
            current directory)
          required: false
          type: string
        file_name:
          description: the name of the file to search for
          required: true
          type: string
      code: 'find_file() {    if [ $# -eq 1 ]; then        local file_name="$1"        local
        dir="./"    elif [ $# -eq 2 ]; then        local file_name="$1"        if
        [ -d "$2" ]; then            local dir="$2"        else            echo "Directory
        $2 not found"            return        fi    else        echo "Usage: find_file
        <file_name> [<dir>]"        return    fi    dir=$(realpath "$dir")    local
        matches=$(find "$dir" -type f -name "$file_name")    # if no matches, return    if
        [ -z "$matches" ]; then        echo "No matches found for \"$file_name\" in
        $dir"        return    fi    # Calculate total number of matches    local
        num_matches=$(echo "$matches" | wc -l | awk ''{$1=$1; print $0}'')    echo
        "Found $num_matches matches for \"$file_name\" in $dir:"    echo "$matches"
        | awk ''{print $0}''}'
      docstring: finds all files with the given name in dir. If dir is not provided,
        searches in the current directory
      end_name: null
      name: find_file
      signature: find_file <file_name> [<dir>]
    - arguments:
        end_line:
          description: the line number to end the edit at (inclusive)
          required: true
          type: integer
        replacement_text:
          description: the text to replace the current selection with
          required: true
          type: string
        start_line:
          description: the line number to start the edit at
          required: true
          type: integer
      code: 'edit() {    if [ -z "$CURRENT_FILE" ]    then        echo ''No file open.
        Use the `open` command first.''        return    fi    local start_line="$(echo
        $1: | cut -d: -f1)"    local end_line="$(echo $1: | cut -d: -f2)"    if [
        -z "$start_line" ] || [ -z "$end_line" ]    then        echo "Usage: edit
        <start_line>:<end_line>"        return    fi    local re=''^[0-9]+$''    if
        ! [[ $start_line =~ $re ]]; then        echo "Usage: edit <start_line>:<end_line>"        echo
        "Error: start_line must be a number"        return    fi    if ! [[ $end_line
        =~ $re ]]; then        echo "Usage: edit <start_line>:<end_line>"        echo
        "Error: end_line must be a number"        return    fi    # Bash array starts
        at 0, so let''s adjust    local start_line=$((start_line - 1))    local end_line=$((end_line))    local
        line_count=0    local replacement=()    while IFS= read -r line    do        replacement+=("$line")        ((line_count++))    done    #
        Create a backup of the current file    cp "$CURRENT_FILE" "/root/$(basename
        "$CURRENT_FILE")_backup"    # Read the file line by line into an array    mapfile
        -t lines < "$CURRENT_FILE"    local new_lines=("${lines[@]:0:$start_line}"
        "${replacement[@]}" "${lines[@]:$((end_line))}")    # Write the new stuff
        directly back into the original file    printf "%s\n" "${new_lines[@]}" >|
        "$CURRENT_FILE"    # Run linter    if [[ $CURRENT_FILE == *.py ]]; then        lint_output=$(flake8
        --isolated --select=F821,F822,F831,E111,E112,E113,E999,E902 "$CURRENT_FILE"
        2>&1)    else        # do nothing        lint_output=""    fi    # if there
        is no output, then the file is good    if [ -z "$lint_output" ]; then        export
        CURRENT_LINE=$start_line        _constrain_line        _print        echo
        "File updated. Please review the changes and make sure they are correct (correct
        indentation, no duplicate lines, etc). Edit the file again if necessary."    else        echo
        "Your proposed edit has introduced new syntax error(s). Please read this error
        message carefully and then retry editing the file."        echo ""        echo
        "ERRORS:"        _split_string "$lint_output"        echo ""        # Save
        original values        original_current_line=$CURRENT_LINE        original_window=$WINDOW        #
        Update values        export CURRENT_LINE=$(( (line_count / 2) + start_line
        )) # Set to "center" of edit        export WINDOW=$((line_count + 10)) # Show
        +/- 5 lines around edit        echo "This is how your edit would have looked
        if applied"        echo "-------------------------------------------------"        _constrain_line        _print        echo
        "-------------------------------------------------"        echo ""        #
        Restoring CURRENT_FILE to original contents.        cp "/root/$(basename "$CURRENT_FILE")_backup"
        "$CURRENT_FILE"        export CURRENT_LINE=$(( ((end_line - start_line + 1)
        / 2) + start_line ))        export WINDOW=$((end_line - start_line + 10))        echo
        "This is the original code before your edit"        echo "-------------------------------------------------"        _constrain_line        _print        echo
        "-------------------------------------------------"        # Restore original
        values        export CURRENT_LINE=$original_current_line        export WINDOW=$original_window        echo
        "Your changes have NOT been applied. Please fix your edit command and try
        again."        echo "You either need to 1) Specify the correct start/end line
        arguments or 2) Correct your edit code."        echo "DO NOT re-run the same
        failed edit command. Running it again will lead to the same error."    fi    #
        Remove backup file    rm -f "/root/$(basename "$CURRENT_FILE")_backup"}'
      docstring: replaces lines <start_line> through <end_line> (inclusive) with the
        given text in the open file. The replacement text is terminated by a line
        with only end_of_edit on it. All of the <replacement text> will be entered,
        so make sure your indentation is formatted properly. Python files will be
        checked for syntax errors after the edit. If the system detects a syntax error,
        the edit will not be executed. Simply try to edit the file again, but make
        sure to read the error message and modify the edit command you issue accordingly.
        Issuing the same command a second time will just lead to the same error message
        again.
      end_name: end_of_edit
      name: edit
      signature: |-
        edit <start_line>:<end_line>
        <replacement_text>
        end_of_edit
    _subroutines: {}
    blocklist:
    - vim
    - vi
    - emacs
    - nano
    - nohup
    - git
    blocklist_error_template: Interactive operation '{name}' is not supported by this
      environment
    blocklist_standalone:
    - python
    - python3
    - ipython
    - bash
    - sh
    - exit
    - /bin/bash
    - /bin/sh
    - nohup
    - vi
    - vim
    - emacs
    - nano
    command_docs: |+
      open:
        docstring: opens the file at the given path in the editor. If line_number is provided, the window will be move to include that line
        signature: open <path> [<line_number>]
        arguments:
          - path (string) [required]: the path to the file to open
          - line_number (integer) [optional]: the line number to move the window to (if not provided, the window will start at the top of the file)

      goto:
        docstring: moves the window to show <line_number>
        signature: goto <line_number>
        arguments:
          - line_number (integer) [required]: the line number to move the window to

      scroll_down:
        docstring: moves the window down {WINDOW} lines
        signature: scroll_down

      scroll_up:
        docstring: moves the window down {WINDOW} lines
        signature: scroll_up

      create:
        docstring: creates and opens a new file with the given name
        signature: create <filename>
        arguments:
          - filename (string) [required]: the name of the file to create

      submit:
        docstring: submits your current code and terminates the session
        signature: submit

      search_dir:
        docstring: searches for search_term in all files in dir. If dir is not provided, searches in the current directory
        signature: search_dir <search_term> [<dir>]
        arguments:
          - search_term (string) [required]: the term to search for
          - dir (string) [optional]: the directory to search in (if not provided, searches in the current directory)

      search_file:
        docstring: searches for search_term in file. If file is not provided, searches in the current open file
        signature: search_file <search_term> [<file>]
        arguments:
          - search_term (string) [required]: the term to search for
          - file (string) [optional]: the file to search in (if not provided, searches in the current open file)

      find_file:
        docstring: finds all files with the given name in dir. If dir is not provided, searches in the current directory
        signature: find_file <file_name> [<dir>]
        arguments:
          - file_name (string) [required]: the name of the file to search for
          - dir (string) [optional]: the directory to search in (if not provided, searches in the current directory)

      edit:
        docstring: replaces lines <start_line> through <end_line> (inclusive) with the given text in the open file. The replacement text is terminated by a line with only end_of_edit on it. All of the <replacement text> will be entered, so make sure your indentation is formatted properly. Python files will be checked for syntax errors after the edit. If the system detects a syntax error, the edit will not be executed. Simply try to edit the file again, but make sure to read the error message and modify the edit command you issue accordingly. Issuing the same command a second time will just lead to the same error message again.
        signature: edit <start_line>:<end_line>
      <replacement_text>
      end_of_edit
        arguments:
          - start_line (integer) [required]: the line number to start the edit at
          - end_line (integer) [required]: the line number to end the edit at (inclusive)
          - replacement_text (string) [required]: the text to replace the current selection with

    command_files:
    - /workspaces/SWE-agent/config/commands/defaults.sh
    - /workspaces/SWE-agent/config/commands/search.sh
    - /workspaces/SWE-agent/config/commands/edit_linting.sh
    - /workspaces/SWE-agent/config/commands/_split_string.py
    demonstration_template: |
      Here is a demonstration of how to correctly accomplish this task.
      It is included to show you how to correctly use the interface.
      You do not need to follow exactly what is done in the demonstration.
      --- DEMONSTRATION ---
      {demonstration}
      --- END OF DEMONSTRATION ---
    demonstrations:
    - /workspaces/SWE-agent/trajectories/demonstrations/replay__marshmallow-code__marshmallow-1867__default_sys-env_window100__t-0.20__p-0.95__c-2.00__install-1/marshmallow-code__marshmallow-1867.traj
    env_variables:
      CURRENT_FILE: ''
      CURRENT_LINE: '0'
      OVERLAP: '2'
      SEARCH_FILES: ()
      SEARCH_INDEX: '0'
      SEARCH_RESULTS: ()
      WINDOW: '100'
    format_error_template: |
      Your output was not formatted correctly. You must always include one discussion and one command as part of your response. Make sure you do not have multiple discussion/command tags.
      Please make sure your output precisely matches the following format:
      DISCUSSION
      Discuss here with yourself about what your planning and what you're going to do in this step.

  command(s) that you're going to run
  ```
history_processor: {}
history_processor_args: {}
instance_template: |-
  We're currently solving the following issue within our repository. Here's the issue text:
  ISSUE:
  {issue}

  INSTRUCTIONS:
  Now, you're going to solve this issue on your own. Your terminal session has started and you're in the repository's root directory. You can use any bash commands or the special interface to help you. Edit all the files you need to and run any checks or tests that you want.
  Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command.
  When you're satisfied with all of the changes you've made, you can submit your changes to the code base by simply running the submit command.
  Note however that you cannot use any interactive session commands (e.g. python, vim) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`.

  NOTE ABOUT THE EDIT COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line!

  IMPORTANT TIPS:
  1. Always start by trying to replicate the bug that the issues discusses.
     If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment, and run it to make sure you can reproduce the bug.
     Then start trying to fix it.
     When you think you've fixed the bug, re-run the bug reproduction script to make sure that the bug has indeed been fixed.

     If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file,
     so that you can be sure that the script indeed ran fine all the way through.

  2. If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it!

  3. If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, don't just use the scroll_down command multiple times. Instead, use the goto 583 command. It's much quicker.

  4. If the bug reproduction script requires inputting/reading a specific file, such as buggy-input.png, and you'd like to understand how to input that file, conduct a search in the existing repo code, to see whether someone else has already done that. Do this by running the command: find_file "buggy-input.png" If that doesn't work, use the linux 'find' command.

  5. Always make sure to look at the currently open file and the current working directory (which appears right after the currently open file). The currently open file might be in a different directory than the working directory! Note that some commands, such as 'create', open files, so they might change the current  open file.

  6. When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.

  (Open file: {open_file})
  (Current directory: {working_dir})
  bash-$
next_step_no_output_template: |-
  Your command ran successfully and did not produce any output.
  (Open file: {open_file})
  (Current directory: {working_dir})
  bash-$
next_step_template: |-
  {observation}
  (Open file: {open_file})
  (Current directory: {working_dir})
  bash-$
parse_command: {}
parse_function: {}
put_demos_in_history: false
state_command:
  arguments: null
  code: |
    state() {
      local working_dir="$PWD";
      if [ -z $CURRENT_FILE ]; then
          echo '{"open_file": "n/a", "working_dir": "'$working_dir'"}';
      else
          echo '{"open_file": "'$(realpath $CURRENT_FILE)'", "working_dir": "'$working_dir'"}';
      fi
    };
  docstring: null
  end_name: null
  name: state
  signature: null
strategy_template: null
submit_command: submit
subroutine_types: []
system_template: |-
  SETTING: You are an autonomous programmer, and you're working directly in the command line with a special interface.

  The special interface consists of a file editor that shows you {WINDOW} lines of a file at a time.
  In addition to typical bash commands, you can also use the following commands to help you navigate and edit files.

  COMMANDS:
  {command_docs}

  Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION.
  If you'd like to add the line '        print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.

  RESPONSE FORMAT:
  Your shell prompt is formatted as follows:
  (Open file: <path>) <cwd> $

  You need to format your output using two fields; discussion and command.
  Your output should always include _one_ discussion and _one_ command field EXACTLY as in the following example:
  DISCUSSION
  First I'll start by using ls to see what files are in the current directory. Then maybe we can look at some relevant files to see what they look like.
  ```
  ls -a
  ```

  You should only include a *SINGLE* command in the command section and then wait for a response from the shell before continuing with more discussion and commands. Everything you include in the DISCUSSION section will be saved for future reference.
  If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first command, and then after receiving a response you'll be able to issue the second command.
  You're free to use any other bash commands you want (e.g. find, grep, cat, ls, cd) in addition to the special commands listed above.
  However, the environment does NOT support interactive session commands (e.g. python, vim), so please do not invoke them.
util_functions:
- arguments: null
  code: '_print() {    local total_lines=$(awk ''END {print NR}'' $CURRENT_FILE)    echo
    "[File: $(realpath $CURRENT_FILE) ($total_lines lines total)]"    lines_above=$(jq
    -n "$CURRENT_LINE - $WINDOW/2" | jq ''[0, .] | max | floor'')    lines_below=$(jq
    -n "$total_lines - $CURRENT_LINE - $WINDOW/2" | jq ''[0, .] | max | round'')    if
    [ $lines_above -gt 0 ]; then        echo "($lines_above more lines above)"    fi    cat
    $CURRENT_FILE | grep -n $ | head -n $(jq -n "[$CURRENT_LINE + $WINDOW/2, $WINDOW/2]
    | max | floor") | tail -n $(jq -n "$WINDOW")    if [ $lines_below -gt 0 ];
    then        echo "($lines_below more lines below)"    fi}'
  docstring: null
  end_name: null
  name: _print
  signature: _print
- arguments: null
  code: _constrain_line() {    if [ -z "$CURRENT_FILE" ]    then        echo "No
    file open. Use the open command first."        return    fi    local max_line=$(awk
    'END {print NR}' $CURRENT_FILE)    local half_window=$(jq -n "$WINDOW/2" |
    jq 'floor')    export CURRENT_LINE=$(jq -n "[$CURRENT_LINE, $max_line - $half_window]
    | min")    export CURRENT_LINE=$(jq -n "[$CURRENT_LINE, $half_window] | max")}
  docstring: null
  end_name: null
  name: _constrain_line
  signature: _constrain_line

config_file: config/default.yaml model: host_url: localhost:11434 model_name: gpt4 per_instance_cost_limit: 2.0 replay_path: null temperature: 0.0 top_p: 0.95 total_cost_limit: 0.0 environment: base_commit: null cache_task_images: false container_name: null data_path: princeton-nlp/SWE-bench_Lite environment_setup: null image_name: sweagent/swe-agent:latest install_environment: true no_mirror: false repo_path: '' split: test timeout: null verbose: true instance_filter: .* print_config: true raise_exceptions: false skip_existing: false suffix: ''



I see 179 with ```*.traj``` command

klieret commented 2 weeks ago

Some of them might be from previous runs with the dev set

That shouldn't happen, usually. This should be encoded in the name of the directory.

ivan4722 commented 2 weeks ago

Some of them might be from previous runs with the dev set

That shouldn't happen, usually. This should be encoded in the name of the directory.

I was trying to run eval with SWE-bench, and I had to manually delete some from all_preds.jsonl (such as the sqlfluff entries that are part of dev not test)

klieret commented 2 weeks ago

I think it's most likely that the missing results are for instances where the execution environment for the instance could not be installed. You can get a list of the instances

from datasets import load_dataset
ds = load_dataset("princeton-nlp/SWE-bench_Lite")

and then check what is missing and then try to run swe-agent again on one of the instances with --instance_filter (also see here) to see if you can reproduce the problem.

ivan4722 commented 2 weeks ago

I think it's most likely that the missing results are for instances where the execution environment for the instance could not be installed. You can get a list of the instances
from datasets import load_dataset
ds = load_dataset("princeton-nlp/SWE-bench_Lite")
and then check what is missing and then try to run swe-agent again on one of the instances with --instance_filter (also see here) to see if you can reproduce the problem.

Is benchmarking on SWE-Bench still possible with the missing results?

If the execution environment could not be installed is there any way around that?

klieret commented 2 weeks ago

I actually just fixed bugs today that could have caused installation issues. You can always re-run SWE-agent. It should skip instances that it has already generated trajectories for.

I also added functionality that always saves the logs, so next time we can look at these if you're missing predictions.

ivan4722 commented 6 days ago

Hey,

I ran SWE-agent with GPT 3.5 on SWE-bench-lite, but I only have 183 predictions.

I have the logs here (600k lines though): https://github.com/ivan4722/gpt-3.5-trajectories/blob/main/run-240621142620.log

Also a question: Can i run SWE-bench evaluation with missing trajectories? I am wondering if that is why I am getting errors when trying to run SWE-bench evaluation.

Thanks!

princeton-nlp / SWE-agent

question about all_preds.jsonl with SWE-Bench Lite #580

Describe the issue

Optional: Relevant documentation page